(no commit message)
[libreriscv.git] / shakti / m_class.mdwn
1 # Shakti M-Class Libre SoC
2
3 This SoC is a propsed libre design that draws in expertise from mass-volume
4 SoCs of the past six years and beyond, and is being designed to cover just
5 as wide a range of target embedded / low-power / industrial markets as those
6 SoCs. Pincount is to be kept low in order to reduce cost as well as increase
7 yields.
8
9 * See <http://rise.cse.iitm.ac.in/shakti.html> M-Class for top-level
10 * See [[pinouts]] for auto-generated table of pinouts (including mux)
11 * See [[peripheralschematics]] for example Reference Layouts
12 * See [[ramanalysis]] for a comprehensive analysis of why DDR3 is to be used.
13 * See [[todo]] for a rough list of tasks (and link to bugtracker)
14 * <https://bugs.libre-soc.org/show_bug.cgi?id=2>
15
16 ## Rough specification.
17
18 Quad-core 28nm OpenPOWER 64-bit (OpenPOWER v3.0B core with Simple-V Vector Media / 3D
19 extensions), 300-pin 15x15mm BGA 0.8mm pitch, 32-bit DDR3-4/LPDDR3/4
20 memory interface and libre / open interfaces and accelerated hardware
21 functions suitable for the higher-end, low-power, embedded, industrial
22 and mobile space.
23
24 A 0.8mm pitch BGA allows relatively large (low-cost) VIA drill sizes
25 to be used (8-10mil) and 4-5mil tracks with 4mil clearance. For
26 details see
27 <http://processors.wiki.ti.com/index.php/General_hardware_design/BGA_PCB_design>
28
29 [[shakti_libre_riscv.jpg]]
30
31 ## Die area estimates
32
33 * <http://hwacha.org/papers/riscv-esscirc2014-talk.pdf>
34 * 40nm 64-bit rocket single-core single-issue in-order: 0.14mm^2
35 * 40nm 16-16k L1 caches, 0.25mm^2
36 * <http://people.csail.mit.edu/beckmann/publications/tech.../grain_size_tr_feb_2010.pdf>
37
38 ## Targetting full Libre Licensing to the bedrock.
39
40 The only barrier to being able to replicate the masks from scratch
41 is the proprietary cells (e.g. memory cells) designed by the Foundries:
42 there is a potential long-term strategy in place to deal with that issue.
43
44 The only proprietary interface utilised in the entire SoC is the DDR3/4
45 PHY plus Controller, which will be replaced in a future revision, making
46 the entire SoC exclusively designed and made from fully libre-licensed
47 BSD and LGPL openly and freely accessible VLSI and VHDL source.
48
49 In addition, no proprietary firmware whatsoever will be required to
50 operate or boot the device right from the bedrock: the entire software
51 stack will also be libre-licensed (even for programming the initial
52 proprietary DDR3/4 PHY+Controller)
53
54 # Inspiration from several sources
55
56 The design of this SoC is drawn from at least the following SoCs, which
57 have significant multiplexing for pinouts, reducing pincount whilst at
58 the same time permitting the SoC to be utilised across a very wide range
59 of markets:
60
61 * A10/A20 EVB <http://hands.com/~lkcl/eoma/A10-EVB-V1-2-20110726.pdf>
62 * RK3288 T-Firefly <http://www.t-firefly.com/download/firefly-rk3288/hardware/FR_RK3288_0930.pdf>
63 * Ingenic JZ4760B <ftp://ftp.ingenic.cn/SOC/JZ4760B/JZ4760B_DS_REVISION.PDF>
64 LEPUS Board <ftp://ftp.ingenic.cn/DevSupport/Hardware/RD4760B_LEPUS/RD4760B_LEPUS_V1.3.2.PDF>
65 * GPL-violating CT-PC89e <http://hands.com/~lkcl/seatron/>,
66 and <http://lkcl.net/arm_systems/CT-PC89E/> this was an 8.9in netbook
67 weighing only 0.72kg and having a 3 HOUR battery life on a single 2100mAh
68 cell, its casework alone inspired a decade of copycat china clone
69 netbooks as it was slowly morphed from its original 8.9in up to (currently)
70 an 11in form-factor almost a decade later in 2017.
71 * A64 Reference Designs for example this: <http://linux-sunxi.org/images/3/32/Banana_pi_BPI-M64-V1_1-Release_201609.pdf>
72
73 TI Boards such as the BeagleXXXX Series, or the Freescale iMX6
74 WandBoard etc., are, whilst interesting, have a different kind of focus
75 and "feel" about them, as they are typically designed by Western firms
76 with less access or knowledge of the kinds of low-cost tricks deployed
77 to ingenious and successful effect by Chinese Design Houses. Not only
78 that but they typically know the best components to buy. Western-designed
79 PCBs typically source exclusively from Digikey, AVNet, Mouser etc. and
80 the prices are often two to **TEN** times more costly as a result.
81
82 The TI and Freescale (now NXP) series SoCs themselves are also just as
83 interesting to study, but again have a subtly different focus: cost of
84 manufacture of PCBs utilising them not being one of those primary focii.
85 Freescale's iMX6 is well-known for its awesome intended lifespan and support:
86 **ninteen** years. That does however have some unintended knock-on effects
87 on its pricing.
88
89 Instead, the primary input is taken from Chinese-designed SoCs, where cost
90 and ease of production, manufacturing and design of a PCB using the planned
91 SoC, as well as support for high-volume mass-produced peripherals is
92 firmly a priority focus.
93
94 # Target Markets
95
96 * EOMA68 Computer Card form-factor (general-purpose, eco-conscious)
97 * Smartphone / Tablet (basically the same thing, different LCD/CTP size)
98 * Low-end (ChromeOS style) laptop
99 * Industrial uses when augmented by a suitable MCU (for ADC/DAC/CAN etc.)
100
101 ## Common Peripherals to majority of target markets
102
103 * SPI or 8080 or [RGB/TTL](RGBTTL) or LVDS LCD display. SPI: 320x240. LVDS: 1440x900.
104 * LCD Backlight, requires GPIO power-control plus PWM for brightness control
105 * USB-OTG Port (OTG-Host, OTG Client, Charging capability)
106 * Baseband Modem (GSM / GPRS / 3G / LTE) requiring USB, UART, and PCM audio
107 * Bluetooth, requires either full UART or SD/MMC or USB, plus control GPIO
108 * WIFI, requires either USB (but with power penalties) or better SD/MMC
109 * SD/MMC for external MicroSD
110 * SD/MMC for on-PCB eMMC (care needed on power/boot sequence)
111 * NAND Flash (not recommended), requires 8080/ATI-style Bus with dedicated CS#
112 * Optional 4-wire [[QSPI]] NAND/NOR for boot (XIP - Execute In-place - recommended).
113 * Audio over [[I2S]] (5-pin: 4 for output, 1 for input), fall-back to USB Audio
114 * Audio also over [[AC97]]
115 * Some additional SPI peripherals, e.g. connection to low-power MCU.
116 * GPIO (EINT-capable, with wakeup) for buttons, power, volume etc.
117 * Camera(s) either by CSI-1 (parallel CSI) or better by USB
118 * I2C sensors: accelerometer, compass, etc. Each requires EINT and RST GPIO.
119 * Capacitive Touchpanel (I2C and also requiring EINT and RST GPIO)
120 * Real-time Clock (usually an I2C device but may be on-board a support MCU)
121 * [[PCIe]] via PXPIPE
122 * [[LPC]] from Raptor Engineering
123 * [[USB3]]
124 * [[RGMII]] Gigabit Ethernet
125
126 ## Peripherals unique to laptop market
127
128 * Keyboard (USB or keyboard-matrix managed by MCU)
129 * USB, I2C or SPI Mouse-trackpad (plus button GPIO, EINT capable)
130
131 ## Peripherals common to laptop and Industrial Market
132
133 * Ethernet ([[RGMII]] or better 8080-style XT/AT/ATI MCU bus for e.g. DM9000)
134
135 ## Augmentation by an embedded MCU
136
137 Some functions, particularly analog, are particularly tricky to implement
138 in an early SoC. In addition, CAN is still patented (not any more). For unusual, patented
139 or analog functionality such as RTC, ADC, DAC, SPDIF, One-wire Bus
140 and so on it is easier and simpler to deploy an ultra-low-cost low-speed
141 companion Micro-Controller such as the crystal-less STMS8003 ($0.24) or
142 the crystal-less STM32F072 or other suitable MCU, depending on requirements.
143 For high-speed interconnect it may be wired up as an SPI device, and for
144 lower-speed communication UART would be the simplest and easiest means of
145 two-way communication.
146
147 This technique can be deployed in all scenarios (phone, tablet, laptop,
148 industrial), and is an extremely low-cost way of getting RTC functionality
149 for example. The cost of, for example, dedicated I2C sensors that provide
150 RTC functionality, or ADC or DAC or "Digipot", are actually incredibly
151 high, relatively speaking. Some very simple software and a general-purpose
152 MCU does the exact same job. In particularly cost-sensitive applications,
153 DAC may be substituted by a PWM, an RC circuit, and an optional feedback
154 loop into an ADC pin to monitor situations where changing load on the RC
155 circuit alters the output voltage. All done entirely in the MCU's software.
156
157 An MCU may even be used to emulate SPI "XIP" (Execute in-place) NAND
158 memory, such that there is no longer a need to deploy a dedicated SPI
159 NOR bootloader IC (which are really quite expensive). By emulating
160 an SPI XIP device the SoC may boot from the NAND Flash storage built-in
161 to the embedded MCU, or may even feed the SoC data from a USB-OTG
162 or other interface. This makes for an extremely flexible bootloader
163 capability, without the need for totally redoing the SoC masks just to
164 add extra BOOTROM functions.
165
166 ## Common Internal (on-board) acceleration and hardware functions
167
168 * 2D accelerated display
169 * 3D accelerated graphics
170 * Video encode / decode
171 * Image encode / decode
172 * Crypto functions (SHA, Rijndael, DES, etc., Diffie-Hellman, RSA)
173 * Cryptographically-secure PRNG (hard to get right)
174
175 ### 2D acceleration
176
177 The ORSOC GPU contains basic primitives for 2D: rectangles, sprites,
178 image acceleration, scalable fonts, and Z-buffering and much more.
179
180 <https://opencores.org/project,orsoc_graphics_accelerator>
181
182 <https://github.com/m-labs/milkymist/tree/master/cores/tmu2>
183
184 ### 3D acceleration
185
186 * MIAOW: ATI-compatible shader engine <http://miaowgpu.org/>
187 * ORSOC GPU contains some primitives that can be used
188 * Simple-V Vector extensions can obviate the need for a "full" separate GPU
189 * Nyuzi (OpenMP, based on Intel Larabee Compute Engine)
190 * Rasteriser <https://github.com/jbush001/ChiselGPU/tree/master/hardware>
191 * OpenShader <https://git.code.sf.net/p/openshader/code>
192 * GPLGPU <https://github.com/asicguy/gplgpu>
193 * FlexGripPlus <https://github.com/Jerc007/Open-GPGPU-FlexGrip->
194
195 ### Video encode / decode
196
197 * video primitives <https://opencores.org/project,video_systems>
198 * MPEG decoder <https://opencores.org/project,mpeg2fpga>
199 * Google make free VP8 and VP9 hard macros available for production use only
200
201 ### Image encode / decode
202
203 partially covered by the ORSOC GPU
204
205 ### Crypto functions
206
207 TBD
208
209 ### Cryptographically-secure PRNG
210
211 TBD
212
213 # Proposed Interfaces
214
215 * Plain [[GPIO]] multiplexed with a [[pinmux]] onto (nearly) all other pins
216 * RGB/TTL up to 1440x900 @ 60fps, 24-bit colour
217 * 2x 1-lane [[SPI]]
218 * 1x 4-lane (quad) [[QSPI]]
219 * 4x SD/MMC (1x 1/2/4/8-bit, 3x 1/2/4-bit)
220 * 2x full [[UART]] incl. CTS/RTS
221 * 3x [[UART]] (TX/RX only)
222 * 3x [[I2C]] (in case of address clashes between peripherals)
223 * 8080-style AT/XT/ATI MCU Bus Interface, with multiple (8x CS#) lines
224 * 3x [[PWM]]-capable GPIO
225 * 32x [[EINT]]-cable GPIO with full edge-triggered and low/high IRQ capability
226 * 1x [[I2S]] audio with 4-wire output and 1-wire input.
227 * 3x [[USB2]] ([[ULPI]] for reduced pincount) each capable of USB-OTG support
228 * [[DDR]] DDR3/DDR3L/LPDDR3 32-bit-wide memory controller
229 * [[JTAG]] for debugging
230
231 Some interfaces at:
232
233 * <https://github.com/RoaLogic/apb4_gpio>
234 * <https://github.com/sifive/sifive-blocks/tree/master/src/main/scala/devices/>
235 includes GPIO, SPI, UART, JTAG, I2C, PinCtrl, UART and PWM. Also included
236 is a Watchdog Timer and others.
237 * <https://github.com/sifive/freedom/blob/master/src/main/scala/everywhere/e300artydevkit/Platform.scala>
238 Pinmux ("IOF") for multiplexing several I/O functions onto a single pin
239 * <https://bitbucket.org/casl/c-class/src/0e77398a030bfd705930d0f1b8b9b5050d76e265/src/peripherals/?at=master>
240 including AXI, DMA, GPIO, I2C, JTAG, PLIC, QSPI, SDRAM, UART (and TCM?).
241 FlexBus, HyperBus and xSPI to be added.
242
243 List of Interfaces:
244
245 * [[CSI]]
246 * [[DDR]]
247 * [[JTAG]]
248 * [[I2C]]
249 * [[I2S]]
250 * [[PWM]]
251 * [[EINT]]
252 * [[FlexBus]]
253 * LCD / RGB/TTL [[RGBTTL]]
254 * [[SPI]]
255 * [[QSPI]]
256 * SD/MMC and eMMC [[sdmmc]]
257 * Pin Multiplexing [[pinmux]]
258 * Gigabit Ethernet [[RGMII]]
259 * SDRAM [[sdram]]
260
261 List of Internal Interfaces:
262
263 * [[AXI]]
264 * [[wishbone]]
265
266 # Items requiring clarification, or proposals TBD
267
268 ## Core Voltage Domains from the PMIC
269
270 See [[peripheralschematics]] - what default (start-up) voltage can the
271 core of the proposed 28nm SoC cope with for short durations? The AXP209
272 PMIC defaults to a 1.25v CPU core voltage, and 1.2v for the logic. It
273 can be changed by the SoC by communicating over I2C but the start-up
274 voltage of the PMIC may not be changed. What is the maximum voltage
275 that the SoC can run at, for short durations at a greatly-reduced clock rate?
276
277 ## 3.3v tolerance
278
279 Can the GPIO be made at least 3.3v tolerant?
280
281 ## Shakti Flexbus implementation: 32-bit word-aligned access
282
283 The FlexBus implementation may only make accesses onto the back-end
284 AXI bus on 32-bit word-aligned boundaries. How this affects FlexBus
285 memory accesses (read and write) on 8-bit and 16-bit boundaries is
286 yet to be determined. It is particularly relevant e.g. for 24-bit
287 pixel accesses on 8080 (MCU) style LCD controllers that have their
288 own on-board SRAM.
289
290 ## Confirmation of GPIO Power Domains
291
292 The proposed plan is to stick with a fixed 1.8v GPIO level across all
293 GPIO banks. However as outlined in the section above, this has some
294 distinct disadvantages, particularly for e.g. SRAM access over FlexBus:
295 that would often require a 50-way bi-directional level-shifter Bus IC,
296 with over 100 pins!
297
298 ## Proposal / Concept to include "Minion Cores" on a 7-way pinmux
299
300 The lowRISC team first came up with the idea, instead of having a pinmux,
301 to effectively bit-bang pretty much all GPIO using **multiple** 32-bit
302 RISC-V non-SMP integer-only cores each with a tiny instruction and data
303 cache (or, simpler, access to their own independent on-die SRAM).
304 The reasoning behind this is: if it's a dedicated core, it's not really
305 bit-banging any more. The technique is very commonly deployed, typically
306 using an 8051 MCU engine, as it means that a mass-produced peripheral may
307 be firmware-updated in the field for example if a Standard has unanticipated
308 flaws or otherwise requires updating.
309
310 The proposal here is to add four extra pin-mux selectors (an extra bit
311 to what is currently a 2-bit mux per pin), and for each GPIO bank to map to
312 one of four such ultra-small "Minion Cores". For each pin, Pin-mux 4 would
313 select the first Minion core, Pin-mux 5 would select the second and so on.
314 The sizes of the GPIO banks are as follows:
315
316 * Bank A: 16
317 * Bank B: 28
318 * Bank C: 24
319 * Bank D: 24
320 * Bank E: 24
321 * Bank F: 10
322
323 Therefore, it is proposed that each Minion Core have 28 EINT-capable
324 GPIOs, and that all but Bank A and F map their GPIO number (minus the
325 Bank Designation letter) direct to the Minion Core GPIOs. For Banks
326 A and F, the numbering is proposed to be concatenated, so that A0 through
327 A15 maps to a Minion Core's GPIO 0 to 15, and F0 to F10 map to a Minion
328 Core's GPIO 16 to 25 (another alternative idea would be to split Banks
329 A and F to complete B through E, taking them up to 32 I/O per Minion core).
330
331 With careful selection from different banks it should be possible to map
332 unused spare pins to a complete, contiguous, sequential set of any given
333 Minion Core, such that the Minion Core could then bit-bang anything up to
334 a 28-bit-wide Bus. Theoretically this could make up a second RGB/TTL
335 LCD interface with up to 24 bits per pixel.
336
337 For low-speed interfaces, particularly those with an independent clock
338 that the interface takes into account that the clock changes on a different
339 time-cycle from the data, this should work perfectly fine. Whether the
340 idea is practical for higher-speed interfaces or or not will critically
341 depend on whether the Minion Core can do mask-spread atomic
342 reads/writes from a register to/from memory-addressed GPIO or not,
343 and faster I/O streams will almost certainly require some form of
344 serialiser/de-serialiser hardware-assist, and definitely each their
345 own DMA Engine.
346
347 If the idea proves successful it would be extremely nice to have a
348 future version that has direct access to generic LVDS lines, plus
349 S8/10 ECC hardware-assist engines. If the voltage may be set externally
350 and accurate PLL clock timing provided, it may become possible to bit-bang
351 and software-emulate high-speed interfaces such as SATA, HDMI, PCIe and
352 many more.
353
354 # Testing
355
356 * cocotb
357 * <https://github.com/aoeldemann/cocotb> cocotb AXI4 stream interface
358
359 # Research (to investigate)
360
361 * LPC Interface <https://gitlab.raptorengineering.com/raptor-engineering-public/lpc-spi-bridge-fpga>
362 * <https://level42.ca/projects/ultra64/Documentation/man/pro-man/pro25/index25.1.html>
363 * <http://n64devkit.square7.ch/qa/graphics/ucode.htm>
364 * <https://dac.com/media-center/exhibitor-news/synopsys%E2%80%99-designware-universal-ddr-memory-controller-delivers-30-percent> 110nm DDR3 PHY
365 * <https://bitbucket.org/cfelton/minnesota> myhdl HDL cores
366 * B Extension proposal <https://groups.google.com/a/groups.riscv.org/forum/#!topic/isa-dev/zi_7B15kj6s>
367 * Bit-extracts <https://github.com/cliffordwolf/bextdep>
368 * Bit-reverse <http://programming.sirrida.de/bit_perm.html#general_reverse_bits>
369 * Bit-permutations <http://programming.sirrida.de/bit_perm.html#c_e>
370 * Commentary on Micro-controller <https://github.com/emb-riscv/specs-markdown/blob/develop/improvements-upon-privileged.md>
371 * P-SIMD <https://groups.google.com/a/groups.riscv.org/forum/#!topic/isa-dev/vYVi95gF2Mo>
372
373 >
374 [[!tag cpus]]