+
+\frame{\frametitle{Hardware / Development Complexity Comparison}
+
+ \begin{itemize}
+ \item {\bf Server}: relatively easy. PCIe, RapidIO, XAUI, SATA, GbE, 10GE,
+ DDR3/4 (or HMC) etc. etc. No multiplexing: all interfaces dedicated
+ and high-speed differential pairs.
+ \item {\bf Desktop}: really just a variant of Server.
+ Graphics is a PCIe Card (except if integrated). Peripherals
+ often done in dedicated external ICs ("Southbridge" concept)
+ \item {\bf Embedded}: also pretty easy. Really needs a pinmux. Low clock
+ rate, low power mode. e.g. SiFive Freedom U310.
+ \item {\bf Mobile}: HARD. Performance/Watt matters $=>$ variable core
+ voltage domains {\it per core}. Number of pins matters (affects
+ yield and package cost). Cost
+ matters. Pinmux critical.
+ \end{itemize}
+ {\it Bottom line: Mobile-class processors are challenging!}
+}
+
+
+\frame{\frametitle{Proprietary vs Libre-licensed Interface HDL}
+
+ \begin{itemize}
+ \item DDR3/4: challenging! \$1m for single-use, single instance.\\
+ Symbiotic EDA: \$600k for PHY; CERN developed a Controller\\
+ http://libre-riscv.org/shakti/m\_class/DDR/
+ \item HyperRAM (JEDEC xSPI): lower risk than DDR3/4\\
+ http://libre-riscv.org/shakti/m\_class/HyperRAM/
+ \item RGMII: several available (saves \$50k)\\
+ http://libre-riscv.org/shakti/m\_class/RGMII/
+ \item UART, SPI, I2C, PWM, SD/MMC: all libre (except eMMC).
+ \item Shakti Group has FlexBus, QuadSPI, SRAM, many more.
+ \item RGB/TTL: R. Herveille (SSD2828, SN75LVDS83b, TFP410a)
+ \end{itemize}
+ {\it Basically there's no compelling reason to spend vast sums
+ on proprietary HDL. Sorry Cadence / Mentor / Synopsis / whoever}
+}
+
+
+\frame{\frametitle{Challenging Stuff [1] - Memory Interfaces}
+
+ \begin{itemize}
+ \item DDR3/4 PHYs are analog and very high speed.
+ Impedance training. Extreme timing tolerances on parallel buses.\\
+ No surprise they cost USD \$1m and above.
+ \item Symbiotic EDA will do (Libre) PHY layout for USD \$300k,
+ time to completion for chosen geometry: 8-12 months.
+ \end{itemize}
+ {\it Silicon-proven but still risky. What are the alternatives?}
+ \vspace{4pt}
+ \begin{itemize}
+ \item 133mhz 32-bit SDRAM (um...) maybe even FlexBus?
+ \item HyperRAM (aka JEDEC xSPI) 8-bit SPI 166mhz or DDR-300.\\
+ 300mbyte/sec for only 13 wires, not bad! (We'll take several)\\
+ http://libre-riscv.org/shakti/m\_class/HyperRAM/
+ \item HMC: insanely fast, very low power. OpenHMC (LGPL)
+ https://opencores.org/project/openhmc
+ \end{itemize}
+}
+
+
+\frame{\frametitle{Challenging Stuff [2] - Video Decode Engine}
+
+ \begin{itemize}
+ \item Richard Herveille's Video Core Blocks\\
+ https://opencores.org/project/video\_systems
+ \item Symbiotic EDA MP4 decoder in FPGA
+ \item H.264 seems to have been done...\\
+ https://github.com/adsc-hls/synthesizable\_h264
+ \item Really needs SIMD (or better, not-SIMD)\\
+ {http://libre-riscv.org/simple\_v\_extension/}
+ \item Definitely needs xBitManip (parallelised by Simple-V)\\
+ https://github.com/cliffordwolf/xbitmanip
+ \end{itemize}
+ {\it SIMD is insane. $O(N^6)$ opcode proliferation. See\\
+ https://www.sigarch.org/simd-instructions-considered-harmful/ \\
+ (1): P-Ext designed for Audio. (2): Investigate RI5CY's SIMD
+ }
+}
+
+
+\frame{\frametitle{Challenging Stuff [3] - Libre 3D GPU. Sigh.}
+
+ \begin{itemize}
+ \item Actual requirements quite modest: 30MP/s 100MT/s 5GFLOPS
+ but power/area is crucial ($2mm^2$ @ 40nm)
+ \item Nyuzi, MIAOW, GPLGPU (Number Nine), OGP.
+ \item Nyuzi based on Larrabee. Jeff Bush really helpful.
+ \item MIAOW is an OpenCL engine. GPLGPU is fixed-function
+ \item Nyuzi lessons: Software-only rendering not enough.
+ Getting through L1 cache takes most power. Fixed functions
+ such as parallel FP-Quad to ARGB Pixel, and Z-Buffer
+ needed.
+ \item Fallback is GC800 (\$250k) {\it contact me if you can do better!}
+ \end{itemize}
+ {\it Jacob Bachmeyer's Cache-control proposal turns L1 Cache into
+ scratchpad RAM. RVV is just too heavy (sorry!), Simple-V much
+ more light-weight and flexible ($O(1)$ ISA proliferation)
+ }
+}
+
+