# How to add a new peripheral This document describes the process of adding a new peripheral to the pinmux and auto-generator, through worked examples, adding SDRAM and eMMC. First to be covered is SDMMC # Adding a Fast Peripheral This section covers how to add a peripheral that is intended to go onto the "Fast" peripheral bus. ## Creating the specifications The tool is split into two halves that are separated by tab-separated files. The first step is therefore to add a function that defines the peripheral as a python function. That implies in turn that the pinouts of the peripheral must be known. Looking at the BSV code for the SDRAM peripheral, we find its interface is defined as follows: interface Ifc_sdram_out; (*always_enabled,always_ready*) method Action ipad_sdr_din(Bit#(64) pad_sdr_din); method Bit#(9) sdram_sdio_ctrl(); method Bit#(64) osdr_dout(); method Bit#(8) osdr_den_n(); method Bool osdr_cke(); method Bool osdr_cs_n(); method Bool osdr_ras_n (); method Bool osdr_cas_n (); method Bool osdr_we_n (); method Bit#(8) osdr_dqm (); method Bit#(2) osdr_ba (); method Bit#(13) osdr_addr (); interface Clock sdram_clk; endinterface Also note further down that the code to map, for example, the 8 actual dqm pins into a single 8-bit interface has also been auto-generated. Generally it is a good idea to verify that correspondingly the three data in/out/outen interfaces have also been correctly generated. interface sdr = interface PeripheralSideSDR ... ... interface dqm = interface Put#(8) method Action put(Bit#(8) in); wrsdr_sdrdqm0 <= in[0]; wrsdr_sdrdqm1 <= in[1]; wrsdr_sdrdqm2 <= in[2]; wrsdr_sdrdqm3 <= in[3]; wrsdr_sdrdqm4 <= in[4]; wrsdr_sdrdqm5 <= in[5]; wrsdr_sdrdqm6 <= in[6]; wrsdr_sdrdqm7 <= in[7]; endmethod endinterface; endinterface; So now we go to src/spec/pinfunctions.py and add a corresponding function that returns a list of all of the required pin signals. However, we note that it is a huge number of pins so a decision is made to split it into groups: sdram1, sdram2 and sdram3. Firstly, sdram1, covering the base functionality: def sdram1(suffix, bank): buspins = [] inout = [] for i in range(8): pname = "SDRDQM%d*" % i buspins.append(pname) for i in range(8): pname = "SDRD%d*" % i buspins.append(pname) inout.append(pname) for i in range(12): buspins.append("SDRAD%d+" % i) for i in range(2): buspins.append("SDRBA%d+" % i) buspins += ['SDRCKE+', 'SDRRASn+', 'SDRCASn+', 'SDRWEn+', 'SDRCSn0++'] return (buspins, inout) This function, if used on its own, would define an 8-bit SDRAM bus with 12-bit addressing. Checking off the names against the corresponding BSV definition we find that most of them are straightforward. Outputs must have a "+" after the name (in the python representation), inputs must have a "-". However we run smack into an interesting brick-wall with the in/out pins. In/out pins which are routed through the same IO pad need a *triplet* of signals: one input wire, one output wire and *one direction control wire*. Here however we find that the SDRAM controller, which is a wrapper around the opencores SDRAM controller, has a *banked* approach to direction-control that will need to be dealt with, later. So we do *not* make the mistake of adding 8 SDRDENx pins: the BSV code will need to be modified to add 64 one-for-one enabling pins. We do not also make the mistake of adding separate unidirectional "in" and separate unidirectional "out" signals under different names, as the pinmux code is a *PAD* centric tool. The second function extends the 8-bit data bus to 64-bits, and extends the address lines to 13-bit wide: def sdram3(suffix, bank): buspins = [] inout = [] for i in range(12, 13): buspins.append("SDRAD%d+" % i) for i in range(8, 64): pname = "SDRD%d*" % i buspins.append(pname) inout.append(pname) return (buspins, inout) In this way, alternative SDRAM controller implementations can use sdram1 on its own; implementors may add "extenders" (named sdram2, sdram4) that cover extra functionality, and, interestingly, in a pinbank scenario, the number of pins on any given GPIO bank may be kept to a sane level. The next phase is to add the (now supported) peripheral to the list of pinspecs at the bottom of the file, so that it can actually be used: pinspec = (('IIS', i2s), ('MMC', emmc), ('FB', flexbus1), ('FB', flexbus2), ('SDR', sdram1), ('SDR', sdram2), ('SDR', sdram3), <--- ('EINT', eint), ('PWM', pwm), ('GPIO', gpio), ) This gives a declaration that any time the function(s) starting with "sdram" are used to add pins to a pinmux, it will be part of the "SDR" peripheral. Note that flexbus is similarly subdivided into two groups. Note however that due to a naming convention issue, interfaces must be declared with names that are lexicographically unique even in subsets of their names. i.e two interfaces, one named "SD" which is shorthand for SDMMC and another named "SDRAM" may *not* be added: the first has to be the full "SDMMC" or renamed to "MMC". ## Adding the peripheral to a chip's pinmux specification Next, we add the peripheral to an actual chip's specification. In this case it is to be added to i\_class, so we open src/spec/i\_class.py. The first thing to do is to add a single-mux (dedicated) bank of 92 pins (!) which covers all of the 64-bit Data lines, 13 addresses and supporting bank-selects and control lines. It is added as Bank "D", the next contiguous bank: def pinspec(): pinbanks = { 'A': (28, 4), 'B': (18, 4), 'C': (24, 1), 'D': (92, 1), <--- } fixedpins = { 'CTRL_SYS': [ This declares the width of the pinmux to one (a dedicated peripheral bank). Note in passing that A and B are both 4-entry. Next, an SDRAM interface is conveniently added to the chip's pinmux with two simple lines of code: ps.gpio("", ('B', 0), 0, 0, 18) ps.flexbus1("", ('B', 0), 1, spec=flexspec) ps.flexbus2("", ('C', 0), 0) ps.sdram1("", ('D', 0), 0) <-- ps.sdram3("", ('D', 35), 0) <-- Note that the first argument is blank, indicating that this is the only SDRAM interface to be added. If more than one SDRAM interface is desired they would be numbered from 0 and identified by their suffix. The second argument is a tuple of (Bank Name, Bank Row Number), and the third argument is the pinmux column (which in this case must be zero). At the top level the following command is then run: $ python src/pinmux_generator.py -o i_class -s i_class The output may be found in the ./i\_class subdirectory, and it is worth examining the i\_class.mdwn file. A table named "Bank D" will have been created and it is worth just showing the first few entries here: | Pin | Mux0 | Mux1 | Mux2 | Mux3 | | --- | ----------- | ----------- | ----------- | ----------- | | 70 | D SDR_SDRDQM0 | | 71 | D SDR_SDRDQM1 | | 72 | D SDR_SDRDQM2 | | 73 | D SDR_SDRDQM3 | | 74 | D SDR_SDRDQM4 | | 75 | D SDR_SDRDQM5 | | 76 | D SDR_SDRDQM6 | | 77 | D SDR_SDRDQM7 | | 78 | D SDR_SDRD0 | | 79 | D SDR_SDRD1 | | 80 | D SDR_SDRD2 | | 81 | D SDR_SDRD3 | | 82 | D SDR_SDRD4 | | 83 | D SDR_SDRD5 | | 84 | D SDR_SDRD6 | | 85 | D SDR_SDRD7 | | 86 | D SDR_SDRAD0 | | 87 | D SDR_SDRAD1 | Returning to the definition of sdram1 and sdram3, this table clearly corresponds to the functions in src/spec/pinfunctions.py which is exactly what we want. It is however extremely important to verify. Lastly, the peripheral is a "fast" peripheral, i.e. it must not be added to the "slow" peripherals AXI4-Lite Bus, so must be added to the list of "fast" peripherals, here: ps = PinSpec(pinbanks, fixedpins, function_names, ['lcd', 'jtag', 'fb', 'sdr']) <-- # Bank A, 0-27 ps.gpio("", ('A', 0), 0, 0, 28) This basically concludes the first stage of adding a peripheral to the pinmux / autogenerator tool. It allows peripherals to be assessed for viability prior to actually committing the engineering resources to their deployment. ## Adding the code auto-generators. With the specification now created and well-defined (and now including the SDRAM interface), the next completely separate phase is to auto-generate the code that will drop an SDRAM instance onto the fabric of the SoC. This particular peripheral is classified as a "Fast Bus" peripheral. "Slow" peripherals will need to be the specific topic of an alternative document, however the principles are the same. The first requirement is that the pins from the peripheral side be connected through to IO cells. This can be verified by running the pinmux code generator (to activate "default" behaviour), just to see what happens: $ python src/pinmux_generator.py -o i_class Files are auto-generated in ./i\_class/bsv\_src and it is recommended to examine the pinmux.bsv file in an editor, and search for occurrences of the string "sdrd63". It can clearly be seen that an interface named "PeripheralSideSDR" has been auto-generated: // interface declaration between SDR and pinmux (*always_ready,always_enabled*) interface PeripheralSideSDR; interface Put#(Bit#(1)) sdrdqm0; interface Put#(Bit#(1)) sdrdqm1; interface Put#(Bit#(1)) sdrdqm2; interface Put#(Bit#(1)) sdrdqm3; interface Put#(Bit#(1)) sdrdqm4; interface Put#(Bit#(1)) sdrdqm5; interface Put#(Bit#(1)) sdrdqm6; interface Put#(Bit#(1)) sdrdqm7; interface Put#(Bit#(1)) sdrd0_out; interface Put#(Bit#(1)) sdrd0_outen; interface Get#(Bit#(1)) sdrd0_in; .... .... endinterface Note that for the data lines, that where in the sdram1 specification function the signals were named "SDRDn+, out, out-enable *and* in interfaces/methods have been created, as these will be *directly* connected to the I/O pads. Further down the file we see the *actual* connection to the I/O pad (cell). An example: // -------------------- // ----- cell 161 ----- // output muxer for cell idx 161 cell161_mux_out= wrsdr_sdrd63_out; // outen muxer for cell idx 161 cell161_mux_outen= wrsdr_sdrd63_outen; // bi-directional // priority-in-muxer for cell idx 161 rule assign_wrsdr_sdrd63_in_on_cell161; wrsdr_sdrd63_in<=cell161_mux_in; endrule Here, given that this is a "dedicated" cell (with no muxing), we have *direct* assignment of all three signals (in, out, outen). 2-way, 3-way and 4-way muxing creates the required priority-muxing for inputs and straight-muxing for outputs, however in this instance, a deliberate pragmatic decision is being taken not to put 92 pins of 133mhz+ signalling through muxing. ### Making the peripheral a "MultiBus" peripheral The sheer number of signals coming out of PeripheralSideSDR is so unwieldy that something has to be done. We therefore create a "MultiBus" interface such that the pinmux knows which pins are grouped together by name. This is done in src/bsv/interface\_decl.py. The MultiBus code is quite sophisticated, in that buses can be identified by pattern, and removed one by one. The *remaining* pins are left behind as individual single-bit pins. Starting from a copy of InterfaceFlexBus as the most similar code, a cut/paste copy is taken and the new class InterfaceSDRAM created: class InterfaceSDRAM(InterfaceMultiBus, Interface): def __init__(self, ifacename, pinspecs, ganged=None, single=False): Interface.__init__(self, ifacename, pinspecs, ganged, single) InterfaceMultiBus.__init__(self, self.pins) self.add_bus(False, ['dqm', None, None], "Bit#({0})", "sdrdqm") self.add_bus(True, ['d_out', 'd_out_en', 'd_in'], "Bit#({0})", "sdrd") self.add_bus(False, ['ad', None, None], "Bit#({0})", "sdrad") self.add_bus(False, ['ba', None, None], "Bit#({0})", "sdrba") def ifacedef2(self, *args): return InterfaceMultiBus.ifacedef2(self, *args) Here, annoyingly, the data bus is a mess, requiring identification of the three separate names for in, out and outen. The prefix "sdrd" is however unique and obvious in its purpose: anything beginning with "sdrd" is treated as a multi-bit bus, and a template for declaring a BSV type is given that is automatically passed the numerical quantity of pins detected that start with the word "sdrd". Note that it is critical to lexicographically identify pins correctly, so sdrdqm is done **before** sdrd. Once the buses have been identified the peripheral can be added into class Interfaces: class Interfaces(InterfacesBase, PeripheralInterfaces): """ contains a list of interface definitions """ def __init__(self, pth=None): InterfacesBase.__init__(self, Interface, pth, {'gpio': InterfaceGPIO, 'fb': InterfaceFlexBus, 'sdr': InterfaceSDRAM, <-- Running the tool again results in a much smaller, tidier output that will be a lot less work, later. Note the automatic inclusion of the correct length multi-bit interfaces. d-out/in/out-en is identified as 64-bit, ad is identified as 13-bit, ba as 2 and dqm as 8. // interface declaration between SDR and pinmux (*always_ready,always_enabled*) interface PeripheralSideSDR; interface Put#(Bit#(1)) sdrcke; interface Put#(Bit#(1)) sdrrasn; interface Put#(Bit#(1)) sdrcasn; interface Put#(Bit#(1)) sdrwen; interface Put#(Bit#(1)) sdrcsn0; interface Put#(Bit#(8)) dqm; interface Put#(Bit#(64)) d_out; interface Put#(Bit#(64)) d_out_en; interface Get#(Bit#(64)) d_in; interface Put#(Bit#(13)) ad; interface Put#(Bit#(2)) ba; endinterface ### Adding the peripheral In examining the slow\_peripherals.bsv file, there should at this stage be no sign of an SDRAM peripheral having been added, at all. This is because it is missing from the peripheral\_gen side of the tool. However, as the slow\_peripherals module takes care of the IO cells (because it contains a declared and configured instance of the pinmux package), signals from the pinmux PeripheralSideSDR instance need to be passed *through* the slow peripherals module as an external interface. This will happen automatically once a code-generator class is added. So first, we must identify the nearest similar class. FlexBus looks like a good candidate, so we take a copy of src/bsv/peripheral\_gen/flexbus.py called sdram.py. The simplest next step is to global/search/replace "flexbus" with "sdram", and for peripheral instance declaration replace "fb" with "sdr". At this phase, despite knowing that it will auto-generate the wrong code, we add it as a "supported" peripheral at the bottom of src/bsv/peripheral\_gen/base.py, in the "PFactory" (Peripheral Factory) class: from gpio import gpio from rgbttl import rgbttl from flexbus import flexbus from sdram import sdram <-- for k, v in {'uart': uart, 'rs232': rs232, 'sdr': sdram, <-- 'twi': twi, 'quart': quart, Note that the name "SDR" matches with the prefix used in the pinspec declaration, back in src/spec/pinfunctions.py, except lower-cased. Once this is done, and the auto-generation tool re-run, examining the slow\_peripherals.bsv file again shows the following (correct) and only the following (correct) additions: method Bit#(1) quart0_intr; method Bit#(1) quart1_intr; interface GPIO_config#(28) pad_configa; interface PeripheralSideSDR sdr0; <-- interface PeripheralSideFB fb0; .... .... interface iocell_side=pinmux.iocell_side; interface sdr0 = pinmux.peripheral_side.sdr; <-- interface fb0 = pinmux.peripheral_side.fb; These automatically-generated declarations are sufficient to "pass through" the SDRAM "Peripheral Side", which as we know from examination of the code is directly connected to the relevant IO pad cells, so that the *actual* peripheral may be declared in the "fast" fabric and connected up to the relevant and required "fast" bus. ### Connecting in the fabric Now we can begin the process of systematically inserting the correct "voodoo magic" incantations that, as far as this auto-generator tool is concerned, are just bits of ASCII text. In this particular instance, an SDRAM peripheral happened to already be *in* the SoC's BSV source code, such that the process of adding it to the tool is primarily one of *conversion*. **Please note that it is NOT recommended to do two tasks at once. It is strongly recommended to add any new peripheral to a pre-existing verified project, manually, by hand, and ONLY then to carry out a conversion process to have this tool understand how to auto-generate the fabric** So examining the i\_class socgen.bsv file, we also open up src/bsv/bsv\_lib/soc\_template.bsv in side-by-side windows of maximum 80 characters in width each, and *respect the coding convention for this exact purpose*, can easily fit two such windows side-by-side *as well as* a third containing the source code files that turn that same template into its corresponding output. We can now begin by searching for strings "SDRAM" and "sdr" in both the template and the auto-generated socgen.bsv file. The first such encounter is the import, in the template: `ifdef BOOTROM import BootRom ::*; `endif `ifdef SDRAM <-- xxxx import sdr_top :: *; <-- xxxx `endif <-- xxxx `ifdef BRAM This we can **remove**, and drop the corresponding code-fragment into the sdram slowimport function: class sdram(PBase): def slowimport(self): return "import sdr_top::*;" <-- def num_axi_regs32(self): Now we re-run the auto-generator tool and confirm that, indeed, the ifdef'd code is gone and replaced with an unconditional import: import mqspi :: *; import sdr_top::*; <-- import Uart_bs :: *; import RS232_modified::*; import mspi :: *; Progress! Next, we examine the instance declaration clause. Remember that we cut/paste the flexbus class, so we are expecting to find code that declares the sdr0 instance as a FlexBus peripheral. We are also looking for the hand-created code that is to be *replaced*. Sure enough: AXI4_Slave_to_FlexBus_Master_Xactor_IFC #(`PADDR, `DATA, `USERSPACE) sdr0 <- mkAXI4_Slave_to_FlexBus_Master_Xactor; <-- AXI4_Slave_to_FlexBus_Master_Xactor_IFC #(`PADDR, `DATA, `USERSPACE) fb0 <- mkAXI4_Slave_to_FlexBus_Master_Xactor; ... ... `ifdef BOOTROM BootRom_IFC bootrom <-mkBootRom; `endif `ifdef SDRAM <-- Ifc_sdr_slave sdram<- mksdr_axi4_slave(clk0); <-- `endif <-- So, the mksdr\_axi4\_slave call we *remove* from the template and cut/paste it into the sdram class's mkfast_peripheral function, making sure to substitute the hard-coded instance name "sdram" with a python-formatted template that can insert numerical instance identifiers, should it ever be desired that there be more than one SDRAM peripheral put into a chip: class sdram(PBase): ... ... def mkfast_peripheral(self): return "Ifc_sdr_slave sdr{0} <- mksdr_axi4_slave(clk0);" Re-run the tool and check that the correct-looking code has been created: Ifc_sdr_slave sdr0 <- mksdr_axi4_slave(clk0); <-- AXI4_Slave_to_FlexBus_Master_Xactor_IFC #(`PADDR, `DATA, `USERSPACE) fb0 <- mkAXI4_Slave_to_FlexBus_Master_Xactor; Ifc_rgbttl_dummy lcd0 <- mkrgbttl_dummy(); The next thing to do: searching for the string "sdram\_out" shows that the original hand-generated code contains (contained) a declaration of the SDRAM Interface, presumably to which, when compiling to run on an FPGA, the SDRAM interface would be connected at the top level. Through this interface, connections would be done *by hand* to the IO pads, whereas now they are to be connected *automatically* (on the peripheral side) to the IO pads in the pinmux. However, at the time of writing this is not fully understood by the author, so the fastifdecl and extfastifinstance functions are modified to generate the correct output but the code is *commented out* def extfastifinstance(self, name, count): return "// TODO" + self._extifinstance(name, count, "_out", "", True, ".if_sdram_out") def fastifdecl(self, name, count): return "// (*always_ready*) interface " + \ "Ifc_sdram_out sdr{0}_out;".format(count) Also the corresponding (old) manual declarations of sdram\_out removed from the template: `ifdef SDRAM <-- xxxx (*always_ready*) interface Ifc_sdram_out sdram_out; <-- xxxx `endif <-- xxxx ... ... `ifdef SDRAM <--- xxxx interface sdram_out=sdram.ifc_sdram_out; <--- xxxx `endif <--- xxxx Next, again searching for signs of the "hand-written" code, we encounter the fabric connectivity, which wires the SDRAM to the AXI4. We note however that there is not just one AXI slave device but *two*: one for the SDRAM itself and one for *configuring* the SDRAM. We therefore need to be quite careful about assigning these, as will be subsequently explained. First however, the two AXI4 slave interfaces of this peripheral are declared: class sdram(PBase): ... ... def _mk_connection(self, name=None, count=0): return ["sdr{0}.axi4_slave_sdram", "sdr{0}.axi4_slave_cntrl_reg"] Note that, again, in case multiple instances are ever to be added, the python "format" string "{0}" is inserted so that it can be substituted with the numerical identifier suffix. Also note that the order of declaration of these two AXI4 slave is **important**. Re-running the auto-generator tool, we note the following output has been created, and match it against the corresponding hand-generated (old) code: `ifdef SDRAM mkConnection (fabric.v_to_slaves [fromInteger(valueOf(Sdram_slave_num))], sdram.axi4_slave_sdram); // mkConnection (fabric.v_to_slaves [fromInteger(valueOf(Sdram_cfg_slave_num))], sdram.axi4_slave_cntrl_reg); // `endif // fabric connections mkConnection (fabric.v_to_slaves [fromInteger(valueOf(SDR0_fastslave_num))], sdr0.axi4_slave_sdram); mkConnection (fabric.v_to_slaves [fromInteger(valueOf(SDR0_fastslave_num))], sdr0.axi4_slave_cntrl_reg); Immediately we can spot an issue: whilst the correctly-named slave(s) have been added, they have been added with the *same* fabric slave index. This is unsatisfactory and needs resolving. Here we need to explain a bit more about what is going on. The fabric on an AXI4 Bus is allocated numerical slave numbers, and each slave is also allocated a memory-mapped region that must be resolved in a bi-directional fashion. i.e whenever a particular memory region is accessed, the AXI slave peripheral responsible for dealing with it **must** be correctly identified. So this requires some further crucial information, which is the size of the region that is to be allocated to each slave device. Later this will be extended to being part of the specification, but for now it is auto-allocated based on the size. As a huge hack, it is allocated in 32-bit chunks, as follows: class sdram(PBase): def num_axi_regs32(self): return [0x400000, # defines an entire memory range (hack...) 12] # defines the number of configuration regs So after running the autogenerator again, to confirm that this has generated the correct code, we examine several files, starting with fast\+memory\_map.bsv: /*====== Fast peripherals Memory Map ======= */ `define SDR0_0_Base 'h50000000 `define SDR0_0_End 'h5FFFFFFF // 4194304 32-bit regs `define SDR0_1_Base 'h60000000 `define SDR0_1_End 'h600002FF // 12 32-bit regs This looks slightly awkward (and in need of an external specification section for addresses) but is fine: the range is 1GB for the main map and covers 12 32-bit registers for the SDR Config map. Next we note the slave numbering: typedef 0 SDR0_0__fastslave_num; typedef 1 SDR0_1__fastslave_num; typedef 2 FB0_fastslave_num; typedef 3 LCD0_fastslave_num; typedef 3 LastGen_fastslave_num; typedef TAdd#(LastGen_fastslave_num,1) Sdram_slave_num; typedef TAdd#(Sdram_slave_num ,`ifdef SDRAM 1 `else 0 `endif ) Sdram_cfg_slave_num; Again this looks reasonable and we may subsequently (carefully! noting the use of the TAdd# chain!) remove the #define for Sdram\_cfg\_slave\_num. The next phase is to examine the fn\_addr\_to\_fastslave\_num function, where we note that there were *two* hand-created sections previously, now joined by two *auto-generated* sections: function Tuple2 #(Bool, Bit#(TLog#(Num_Fast_Slaves))) fn_addr_to_fastslave_num (Bit#(`PADDR) addr); if(addr>=`SDRAMMemBase && addr<=`SDRAMMemEnd) return tuple2(True,fromInteger(valueOf(Sdram_slave_num))); <-- else if(addr>=`DebugBase && addr<=`DebugEnd) return tuple2(True,fromInteger(valueOf(Debug_slave_num))); <-- `ifdef SDRAM else if(addr>=`SDRAMCfgBase && addr<=`SDRAMCfgEnd ) return tuple2(True,fromInteger(valueOf(Sdram_cfg_slave_num))); `endif ... ... if(addr>=`SDR0_0_Base && addr<=`SDR0_0_End) <-- return tuple2(True,fromInteger(valueOf(SDR0_0__fastslave_num))); else if(addr>=`SDR0_1_Base && addr<=`SDR0_1_End) <-- return tuple2(True,fromInteger(valueOf(SDR0_1__fastslave_num))); else if(addr>=`FB0Base && addr<=`FB0End) Now, here is where, in a slightly unusual unique set of circumstances, we cannot just remove all instances of this address / typedef from the template code. Looking in the shakti-core repository's src/lib/MemoryMap.bsv file, the SDRAMMemBase macro is utilise in the is\_IO\_Addr function. So as a really bad hack, which will need to be properly resolved, whilst the hand-generated sections from fast\_tuple2\_template.bsv are removed, and the corresponding (now redundant) defines in src/core/core\_parameters.bsv are commented out, some temporary typedefs to deal with the name change are also added: `define SDRAMMemBase SDR0_0_Base `define SDRAMMemEnd SDR0_0_End This needs to be addressed (pun intended) including being able to specify the name(s) of the configuration parameters, as well as specifying which memory map range they must be added to. Now however finally, after carefully comparing the hard-coded fabric connections to what was formerly named sdram, we may remove the mkConnections that drop sdram.axi4\_slave\_sdram and its associated cntrl reg from the soc\_template.bsv file. ### Connecting up the pins We are still not done! It is however worth pointing out that if this peripheral were not wired into the pinmux, we would in fact be finished. However there is a task that (previously having been left to outside tools) now needs to be specified, which is to connect the sdram's pins, declared in this instance in Ifc\_sdram\_out, and the PeripheralSideSDR instance that was kindly / strategically / thoughtfully / absolutely-necessarily exported from slow\_peripherals for exactly this purpose. Recall earlier that we took a cut/paste copy of the flexbus.py code. If we now examine socgen.bsv we find that it contains connections to pins that match the FlexBus specification, not SDRAM. So, returning to the declaration of the Ifc\_sdram\_out interface, we first identify the single-bit output-only pins, and add a mapping table between them: class sdram(PBase): def pinname_out(self, pname): return {'sdrwen': 'ifc_sdram_out.osdr_we_n', 'sdrcsn0': 'ifc_sdram_out.osdr_cs_n', 'sdrcke': 'ifc_sdram_out.osdr_cke', 'sdrrasn': 'ifc_sdram_out.osdr_ras_n', 'sdrcasn': 'ifc_sdram_out.osdr_cas_n', }.get(pname, '') Re-running the tool confirms that the relevant mkConnections are generated: //sdr {'action': True, 'type': 'out', 'name': 'sdrcke'} mkConnection(slow_peripherals.sdr0.sdrcke, sdr0_sdrcke_sync.get); mkConnection(sdr0_sdrcke_sync.put, sdr0.ifc_sdram_out.osdr_cke); //sdr {'action': True, 'type': 'out', 'name': 'sdrrasn'} mkConnection(slow_peripherals.sdr0.sdrrasn, sdr0_sdrrasn_sync.get); mkConnection(sdr0_sdrrasn_sync.put, sdr0.ifc_sdram_out.osdr_ras_n); Next, the multi-value entries are tackled (both in and out). At present the code is messy, as it does not automatically detect the multiple numerical declarations, nor that the entries are sometimes inout (in, out, outen), so it is *presently* done by hand: class sdram(PBase): def _mk_pincon(self, name, count, typ): ret = [PBase._mk_pincon(self, name, count, typ)] assert typ == 'fast' # TODO slow? for pname, stype, ptype in [ ('dqm', 'osdr_dqm', 'out'), ('ba', 'osdr_ba', 'out'), ('ad', 'osdr_addr', 'out'), ('d_out', 'osdr_dout', 'out'), ('d_in', 'ipad_sdr_din', 'in'), ('d_out_en', 'osdr_den_n', 'out'), ]: ret.append(self._mk_vpincon(name, count, typ, ptype, pname, "ifc_sdram_out.{0}".format(stype))) This generates *one* mkConnection for each multi-entry pintype, and here we match up with the "InterfaceMultiBus" class from the specification side, where pin entries with numerically matching names were "grouped" into single multi-bit declarations. ### Adjusting the BSV Interface to a get/put style For various reasons, related to BSV not permitting wires to be connected back-to-back inside the pinmux code, a get/put style of interface had to be done. This requirement has a knock-on effect up the chain into the actual peripheral code. So now the actual interface (Ifc\_sdram\_out) has to be converted. All straight methods (outputs) are converted to Get, and Action methods (inputs) converted to Put. Also, it is just plain sensible not to use Bool but to use Bit#, and for the pack / unpack to be carried out in the interface. After conversion, the code looks like this: interface Ifc_sdram_out; (*always_enabled, always_ready*) interface Put#(Bit#(64)) ipad_sdr_din; interface Get#(Bit#(64)) osdr_dout; interface Get#(Bit#(64)) osdr_den_n; interface Get#(Bit#(1)) osdr_cke; interface Get#(Bit#(1)) osdr_cs_n; interface Get#(Bit#(1)) osdr_ras_n; interface Get#(Bit#(1)) osdr_cas_n; interface Get#(Bit#(1)) osdr_we_n; interface Get#(Bit#(8)) osdr_dqm; interface Get#(Bit#(2)) osdr_ba; interface Get#(Bit#(13)) osdr_addr; method Bit#(9) sdram_sdio_ctrl; interface Clock sdram_clk; endinterface Note that osdr\_den\_n is now **64** bit **not** 8, as discussed above. After conversion, the code looks like this: interface Ifc_sdram_out ifc_sdram_out; interface ipad_sdr_din = interface Put method Action put(Bit#(64) in) sdr_cntrl.ipad_sdr_din <= in; endmethod endinterface; interface osdr_dout = interface Get method ActionValue#(Bit#(64)) get; return sdr_cntrl.osdr_dout(); endmethod endinterface; interface osdr_den_n = interface Get method ActionValue#(Bit#(64)) get; Bit#(64) temp; for (int i=0; i<8; i=i+1) begin temp[i*8] = sdr_cntrl.osdr_den_n[i]; end return temp; endmethod endinterface; interface osdr_cke = interface Get method ActionValue#(Bit#(1)) get; return pack(sdr_cntrl.osdr_cke()); endmethod endinterface; ... ... endinterface; Note that the data input is quite straightforward, as is data out, and cke: whether 8-bit, 13-bit or 64-bit, the conversion process is mundane, with only Bool having to be converted to Bit#(1) with a call to pack. The data-enable however is a massive hack: whilst 64 enable lines come in, only every 8th bit is actually utilised and passed through. Whether this should be changed is a matter for debate that is outside of the scope of this document. Lastly for this phase we have two anomalous interfaces, exposing the control registers and the clock, that have been moved out of Ifc\_sdram\_out, to the level above (Ifc\_sdr\_slave) so that the sole set of interfaces exposed for inter-connection by the auto-generator is related exclusively to the actual pins. Resolution of these two issues is currently outside of the scope of this document. ### Clock synchronisation Astute readers, if their heads have not exploded by this point, will have noticed earlier that the SDRAM instance was declared with an entirely different clock domain from slow peripherals. Whilst the pinmux is completely combinatorial logic that is entirely unclocked, BSV has no means of informing a *module* of this fact, and consequently a weird null-clock splicing trick is required. This code is again auto-generated. However, it is slightly tricky to describe and there are several edge-cases, including ones where the peripheral is a slow peripheral (UART is an example) that is driven from a UART clock but it is connected up inside the slow peripherals code; FlexBus is a Fast Bus peripheral that needs syncing up; RGB/TTL likewise, but JTAG is specifically declared inside the SoC and passed through, but its instantiation requires a separate incoming clock. Examining the similarity between the creation of an SDRAM instance and the JTAG instance, the jtag code therefore looks like it is the best candidate fit. However, it passes through a reset signal as well. Instead, we modify this to create clk0 but use the slow\_reset signal: class sdram(PBase): def get_clk_spc(self, typ): return "tck, slow_reset" def get_clock_reset(self, name, count): return "slow_clock, slow_reset" The resultant synchronisation code, that accepts pairs of clock/reset tuples in order to take care of the prerequisite null reset and clock "synchronisation", looks like this: Ifc_sync#(Bit#(1)) sdr0_sdrcke_sync <-mksyncconnection( clk0, slow_reset, slow_clock, slow_reset); Ifc_sync#(Bit#(1)) sdr0_sdrrasn_sync <-mksyncconnection( clk0, slow_reset, slow_clock, slow_reset); ... ... Ifc_sync#(Bit#(64)) sdr0_d_in_sync <-mksyncconnection( slow_clock, slow_reset, clk0, slow_reset); Note that inputs are in reverse order from outputs. With the inclusion of clock synchronisation, automatic chains of mkConnections are set up between the actual peripheral and the peripheral side of the pinmux: mkConnection(slow_peripherals.sdr0.sdrcke, sdr0_sdrcke_sync.get); mkConnection(sdr0_sdrcke_sync.put, sdr0.ifc_sdram_out.osdr_cke); Interestingly, if *actual* clock synchronisation were ever to be needed, it could easily be taken care of with relatively little extra work. However, it is worth emphasising that the pinmux is *entirely* unclocked zero-reset combinatorial logic. # Adding a "Slow" Peripheral This example will be cheating by cut/paste copying an existing very similar interface, SD/MMC. The SD/MMC interface is presently a "dummy" that will be extended later. However given that the peripherals are so very very similar, common base classes will be used, in a style that has also been used in SPI/QSPI and also UART. ## Adding the pin specifications Looking at src/spec/pinfunctions.py we find that an emmc function actually already exists. So unlike sdram, there are no modifications to be made. We can check that it works by creating an example, say in src/spec/i\_class.py by adding an emmc interface to Bank B: ps.gpio("", ('B', 0), 0, 0, 18) ps.flexbus1("", ('B', 0), 1, spec=flexspec) ps.emmc("", ('B', 0), 3) <--- ps.flexbus2("", ('C', 0), 0) We then need to generate the spec. At the top level the following command is then run: $ python src/pinmux_generator.py -o i_class -s i_class Checking the resultant markdown file ./i\_class/i\_class.mdwn, we find that several entries have been added at the required location: | Pin | Mux0 | Mux1 | Mux2 | Mux3 | | --- | ----------- | ----------- | ----------- | ----------- | | 28 | B GPIOB_B0 | B FB_AD2 | | B EMMC_CMD | | 29 | B GPIOB_B1 | B FB_AD3 | | B EMMC_CLK | | 30 | B GPIOB_B2 | B FB_AD4 | | B EMMC_D0 | | 31 | B GPIOB_B3 | B FB_AD5 | | B EMMC_D1 | | 32 | B GPIOB_B4 | B FB_AD6 | | B EMMC_D2 | | 33 | B GPIOB_B5 | B FB_AD7 | | B EMMC_D3 | | 34 | B GPIOB_B6 | B FB_CS0 | | B EMMC_D4 | | 35 | B GPIOB_B7 | B FB_CS1 | | B EMMC_D5 | | 36 | B GPIOB_B8 | B FB_ALE | | B EMMC_D6 | | 37 | B GPIOB_B9 | B FB_OE | B FB_TBST | B EMMC_D7 | | 38 | B GPIOB_B10 | B FB_RW | | | We also check i\_class/interfaces.txt to see if the single requested emmc interface is there: gpiob 1 eint 1 mqspi 1 emmc 1 <-- uart 3 Also we examine the i\_class/emmc.txt file to check that it has the right types of pin definitions: cmd out clk out d0 inout bus d1 inout bus d2 inout bus d3 inout bus d4 inout bus d5 inout bus d6 inout bus d7 inout bus and we check the i\_class/pinmap.txt tab-separated file to see if it contains the entries corresponding to the markdown table: 24 A 4 gpioa_a24 mspi1_ck jtag_tms mmc0_d0 25 A 4 gpioa_a25 mspi1_nss jtag_tdi mmc0_d1 26 A 4 gpioa_a26 mspi1_io0 jtag_tdo mmc0_d2 27 A 4 gpioa_a27 mspi1_io1 jtag_tck mmc0_d3 28 B 4 gpiob_b0 fb_ad2 emmc_cmd 29 B 4 gpiob_b1 fb_ad3 emmc_clk 30 B 4 gpiob_b2 fb_ad4 emmc_d0 31 B 4 gpiob_b3 fb_ad5 emmc_d1 32 B 4 gpiob_b4 fb_ad6 emmc_d2 33 B 4 gpiob_b5 fb_ad7 emmc_d3 34 B 4 gpiob_b6 fb_cs0 emmc_d4 35 B 4 gpiob_b7 fb_cs1 emmc_d5 36 B 4 gpiob_b8 fb_ale emmc_d6 37 B 4 gpiob_b9 fb_oe fb_tbst emmc_d7 This concludes this section as the purpose of the spec-generation side, to create documentation and TSV files for the second phase, has been fulfilled. Note that we did *not* declare in PinSpec that this peripheral is to be added onto the fastbus, as by default peripherals are added to a single AXI4-Lite interface. ## Adding the pinmux code auto-generator The next phase begins with adding class support to auto-generate the pinmux code. Starting with the following command: $ python src/pinmux_generator.py -o i_class The first thing to do is look at i\_class/bsv\_src/pinmux.bsv, and search for both PeripheralSideMMC and PeripheralSideEMMC. PeripheralSideMMC is very short and compact: // interface declaration between MMC and pinmux (*always_ready,always_enabled*) interface PeripheralSideMMC; interface Put#(Bit#(1)) cmd; interface Put#(Bit#(1)) clk; interface Put#(Bit#(4)) out; interface Put#(Bit#(4)) out_en; interface Get#(Bit#(4)) in; endinterface whereas PeripheralSideEMMC is a mess: interface PeripheralSideEMMC; interface Put#(Bit#(1)) cmd; interface Put#(Bit#(1)) clk; interface Put#(Bit#(1)) d0_out; interface Put#(Bit#(1)) d0_outen; interface Get#(Bit#(1)) d0_in; interface Put#(Bit#(1)) d1_out; interface Put#(Bit#(1)) d1_outen; interface Get#(Bit#(1)) d1_in; interface Put#(Bit#(1)) d2_out; interface Put#(Bit#(1)) d2_outen; interface Get#(Bit#(1)) d2_in; ... ... endinterface To correct this, we need to create an InterfaceEMMC class in src/bsv/interface\_decl.py that generates the right code. However on close inspection, given that the format needed is identical (except for the number of data lines), we can probably get away with using *exactly* the same class: class Interfaces(InterfacesBase, PeripheralInterfaces): def __init__(self, pth=None): InterfacesBase.__init__(self, Interface, pth, {'gpio': InterfaceGPIO, ... ... 'mmc': InterfaceSD, 'emmc': InterfaceSD, <-- 'fb': InterfaceFlexBus, ... and after re-running the command the output looks like this: interface PeripheralSideEMMC; interface Put#(Bit#(1)) cmd; interface Put#(Bit#(1)) clk; interface Put#(Bit#(8)) out; interface Put#(Bit#(8)) out_en; interface Get#(Bit#(8)) in; endinterface Success! The class InterfaceSD appears to be sufficiently generic that it could understand that it had been passed 8-pins worth of data with exactly the same names, rather than 4. This is encouraging in the sense that re-using the SD/MMC BSV generation code should also be as easy. ## Adding the slow peripheral code-generator So this time we will try cut/pasting src/bsv/peripheral\_gen/sdmmc.py to create a base class, MMCBase. The only two common functions are pinname\_out and \_mk\_pincon. class MMCBase(PBase): def pinname_out(self, pname): if pname in ['cmd', 'clk']: return pname return '' def _mk_pincon(self, name, count, typ): ... ... Then, the sdmmc class is modified to inherit it, this time cutting *out* all but those two functions: from bsv.peripheral\_gen.mmcbase import MMCBase <-- class sdmmc(MMCBase): <-- And at the same time we create an emmc.py file where all occurrences of sdmmc are replaced with emmc: class emmc(MMCBase): def slowimport(self): return "import emmc_dummy :: *;" ... ... def _mk_connection(self, name=None, count=0): return "emmc{0}.slave" Finally, to use it, just as with sdram, we add the new peripheral at the bottom of src/bsv/peripheral\_gen/base.py, in the "PFactory" (Peripheral Factory) class: from gpio import gpio from rgbttl import rgbttl from flexbus import flexbus from emmc import emmc <-- for k, v in {'uart': uart, 'rs232': rs232, 'emmc': emmc, <-- For the actual auto-generation phase, this really should be all that's needed. Re-running the code-generator we can examine the auto-generated slow\_peripherals.bsv file and can confirm that yes, an "import emmc\_dummy" has been added, that an mmc0 instance has been created, that it is added to the slave fabric, and that its cmd, clk and in/out/out\_en are all connected up. The last remaining task will therefore be to create an interim emmc "dummy" BSV file. ## Creating the dummy emmc peripheral Adding the actual peripheral is done in a different repository, shakti-peripherals, which can be cloned with: $ git clone gitolite3@libre-riscv.org:shakti-peripherals.git or public: $ git clone git://libre-riscv.org/shakti-peripherals.git Here, what we will do is take a straight copy of src/peripherals/sdmmc/sdcard\_dummy.bsv and call it src/peripherals/emmc/emmc\_dummy.bsv. Then replace all occurrences of "sdcard" with "emmc" and also update the SDBUSWIDTH from 4 to 8. Whilst this appears wasteful it is by far the simplest and quickest way to get working code, that should definitely, definitely be re-factored later. The next stage is to return to the pinmux repository and add the import of the new emmc subdirectory to the BSVINCDIR in both src/bsv/Makefile.template and Makefile.peripherals.template: BSVINCDIR:= $(BSVINCDIR):../../../src/peripherals/src/peripherals/spi BSVINCDIR:= $(BSVINCDIR):../../../src/peripherals/src/peripherals/sdmmc BSVINCDIR:= $(BSVINCDIR):../../../src/peripherals/src/peripherals/emmc BSVINCDIR:= $(BSVINCDIR):../../../src/peripherals/src/peripherals/flexbus Really these should also be auto-generated. Testing through compiling can now take place. ## Compiling the BSV to verilog Here an additional repository is required, which can be cloned as follows: $ git clone gitolite3@libre-riscv.org:shakti-iclass.git or public: $ git clone git://libre-riscv.org/shakti-iclass.git This pulls in submodules automatically, and begins building the BSV, using the following commands: $ ./bin/gitmoduleupdate.sh $ make As the compilation of the soc can take some time, it is possible to use the Makefile.peripherals to compile *only* the slow\_peripherals.bsv file (which we did not do with the sdram case because sdram is a fast peripheral). This would be achieved with the following commands: $ make spec_to_pinmux $ make pinmux_to_bsv $ cd build/i_class/bsv_src $ make -f Makefile.peripherals gen_verilog ## Summary This particular example has been something of a "cheat". The similarity between SD/MMC and eMMC is so high - the only differences being the name of the interface and the number of data pins - that it was simple and straightforward to re-use significant amounts of code, cut/paste style, carrying out some re-factoring where needed. It does however have the advantage of being quite a short tutorial, that illustrates key aspects of adding slow bus peripherals. # Conclusion This is not a small project, by any means. However the payoff in saved time is enormous. The conversion of SDRAM from a hand-crafted fixed and manually laborious task to an automated one took around a day, with debugging and testing to follow. Once done (particularly, once done with *prior tested peripherals*), it becomes possible to add *any arbitrary number* of such peripherals with a single line of code, back in the specification.