1 \documentclass[slidestop
]{beamer
}
2 \usepackage{beamerthemesplit
}
10 \title{Data-Dependent-Fail-First
}
11 \author{Luke Kenneth Casson Leighton and Shriya Sharma
}
18 \huge{The Libre-SOC Hybrid
3D CPU
}\\
20 \Large{Data-Dependent-Fail-First
}\\
25 \large{Sponsored by NLnet's PET Programme
}\\
32 \frame{\frametitle{Why another SoC?
}
35 \item Intel Management Engine, Apple QA issues, Spectre
\vspace{6pt
}
36 \item Endless proprietary drivers, "simplest" solution: \\
37 License proprietary hard macros (with proprietary firmware)\\
38 Adversely affects product development cost\\
39 due to opaque driver bugs (Samsung S3C6410 / S5P100)
41 \item Alternative: Intel and Valve-Steam collaboration\\
42 "Most productive business meeting ever!"\\
43 https://tinyurl.com/valve-steam-intel
45 \item Because for
30 years I Always Wanted To Design A CPU
47 \item Ultimately it is a strategic
\textit{business
} objective to
48 develop entirely Libre hardware, firmware and drivers.
54 \frame{\frametitle{How can you help?
}
59 \item Start here! https://libre-soc.org \\
60 Mailing lists https://lists.libre-soc.org \\
61 IRC Freenode libre-soc \\
62 etc. etc. (it's a Libre project, go figure) \\
64 \item Can I get paid? Yes! NLnet funded\\
65 See https://libre-soc.org/nlnet/\#faq \\
67 \item Also profit-sharing in any commercial ventures \\
69 \item How many opportunities to develop Libre SoCs exist,\\
70 and actually get paid for it?
72 \item I'm not a developer, how can I help?\\
73 - Plenty of research needed, artwork, website \\
74 - Help find customers and OEMs willing to commit (LOI)
80 \frame{\frametitle{What goes into a typical SoC?
}
83 \item 15 to
20mm BGA package:
2.5 to
5 watt power consumption\\
84 heat sink normally not required (simplifies overall design)
86 \item Fully-integrated peripherals (not Northbridge/Southbridge)\\
87 USB, HDMI, RGB/TTL, SD/MMC, I2C, UART, SPI, GPIO etc. etc.
89 \item Built-in GPU (shared memory bus,
3rd party licensed)
\vspace{3pt
}
90 \item Built-in VPU (likewise, proprietary)
\vspace{3pt
}
91 \item Target price between \$
2.50 and \$
30 depending on market\\
92 Radically different from IBM POWER9 Core (
200 Watt)
94 \item We're doing the same, just with a hybrid architecture.\\
101 \frame{\frametitle{Simple SBC-style SoC
}
104 \includegraphics[width=
0.6\textwidth]{pospopcount.png
}
112 \begin{frame
}[fragile
]
113 \frametitle{Simple-V CMPI in a nutshell
}
116 function op
\_cmpi(BA, RA, SI) # cmpi not vector-cmpi!
117 (assuming you know power-isa)
119 for (i =
0; i < VL; i++)
120 CR
[BA+id
] <= compare(ireg
[RA+ira
], SI);
121 if (reg
\_is\_vectorised[BA
] ) \
{ id +=
1; \
}
122 if (reg
\_is\_vectorised[RA
]) \
{ ira +=
1; \
}
126 \item Above is oversimplified: predication etc. left out
127 \item Scalar-scalar and scalar-vector and vector-vector now all in one
128 \item OoO may choose to push CMPIs into instr. queue (v. busy!)
133 \frame{\frametitle{Load/Store Fault-First
}
136 \item Problem: vector load and store can cause a page fault
137 \item Solution: a protocol that allows optional load/store
138 \item instruction
\textit{requests
} a number of elements
139 \item instruction
\textit{informs
} the number actually loaded
140 \item first load/store is not optional
142 \item Load/Store is Memory to/from Register, what about
143 Register to Register?
144 \item Register-to-register: "Data-Dependent Fail-First."
145 \item Z80 LDIR: Mem-Register, CPIR: Register-Register
149 \begin{frame
}[fragile
]
150 \frametitle{Data-Dependent-Fail-First in a nutshell
}
153 function op
\_cmpi(BA, RA, SI) # cmpi not vector-cmpi!
155 for (i =
0; i < VL; i++)
156 CR
[BA+id
] <= compare(ireg
[RA+ira
], SI);
157 if (reg
\_is\_vectorised[BA
] ) \
{ id +=
1; \
}
158 if (reg
\_is\_vectorised[RA
]) \
{ ira +=
1; \
}
159 if test (CR
[BA+id
]) == FAIL: \
{ VL = id +
1; break \
}
163 \item Parallelism still perfectly possible
164 ("hold" writing results until sequential post-analysis
165 carried out. Best done with OoO)
166 \item VL truncation can be inclusive or exclusive
167 (include or exclude a NULL pointer or a
168 string-end character, or overflow result)
175 \frame{\frametitle{Additional Simple-V features
}
178 \item "fail-on-first" (POWER9 VSX strncpy segfaults on boundary!)
179 \item "Twin Predication" (covers VSPLAT, VGATHER, VSCATTER, VINDEX etc.)
180 \item SVP64: extensive "tag" (Vector context) augmentation
181 \item "Context propagation": a VLIW-like context. Allows contexts
182 to be repeatedly applied.
183 Effectively a "hardware compression algorithm" for ISAs.
184 \item Ultimate goal: cut down I-Cache usage, cuts down on power
185 \item Typical GPU has its own I-Cache and small shaders.\\
186 \textit{We are a Hybrid CPU/GPU: I-Cache is not separate!
}
187 \item Needs to go through OpenPOWER Foundation `approval'
191 \frame{\frametitle{maxloc
}
197 \frame{\frametitle{Pospopcount
}
200 \item Positional popcount adds up the totals of each bit set to
1 in each bit-position, of an array of input values.
201 \item Notoriously difficult to do in SIMD assembler: typically
550 lines
204 \lstinputlisting[language=
{}]{pospopcount.c
}
207 \frame{\frametitle{Pospopcount.s
}
210 \lstinputlisting[language=
{}]{pospopcount.s
}
215 \frame{\frametitle{strncpy
}
222 \frame{\frametitle{strncpy assembler
}
224 \lstinputlisting[language=
{}]{strncpy.s
}
228 \frame{\frametitle{linked-list walking
}
233 \frame{\frametitle{Summary
}
236 \item Goal is to create a mass-volume low-power embedded SoC suitable
237 for use in netbooks, chromebooks, tablets, smartphones, IoT SBCs.
238 \item No way we could implement a project of this magnitude without
239 nmigen (being able to use python OO to HDL)
240 \item Collaboration with OpenPOWER Foundation and Members absolutely
241 essential. No short-cuts. Standards to be developed and ratified
242 so that everyone benefits.
243 \item Riding the wave of huge stability of OpenPOWER ecosystem
244 \item Greatly simplified open
3D and Video drivers reduces product
245 development costs for customers
246 \item It also happens to be fascinating, deeply rewarding technically
247 challenging, and funded by NLnet
255 {\Huge The end
\vspace{12pt
}\\
256 Thank you
\vspace{12pt
}\\
257 Questions?
\vspace{12pt
}
262 \item Discussion: http://lists.libre-soc.org
263 \item Freenode IRC \#libre-soc
264 \item http://libre-soc.org/
265 \item http://nlnet.nl/PET
266 \item https://libre-soc.org/nlnet/\#faq