Merge pull request #3310 from robinsonb5-PRs/master
[yosys.git] / manual / CHAPTER_Optimize.tex
1
2 \chapter{Optimizations}
3 \label{chapter:opt}
4
5 Yosys employs a number of optimizations to generate better and cleaner results.
6 This chapter outlines these optimizations.
7
8 \section{Simple Optimizations}
9
10 The Yosys pass {\tt opt} runs a number of simple optimizations. This includes removing unused
11 signals and cells and const folding. It is recommended to run this pass after each major step
12 in the synthesis script. At the time of this writing the {\tt opt} pass executes the following
13 passes that each perform a simple optimization:
14
15 \begin{itemize}
16 \item Once at the beginning of {\tt opt}:
17 \begin{itemize}
18 \item {\tt opt\_expr}
19 \item {\tt opt\_merge -nomux}
20 \end{itemize}
21 \item Repeat until result is stable:
22 \begin{itemize}
23 \item {\tt opt\_muxtree}
24 \item {\tt opt\_reduce}
25 \item {\tt opt\_merge}
26 \item {\tt opt\_rmdff}
27 \item {\tt opt\_clean}
28 \item {\tt opt\_expr}
29 \end{itemize}
30 \end{itemize}
31
32 The following section describes each of the {\tt opt\_*} passes.
33
34 \subsection{The opt\_expr pass}
35
36 This pass performs const folding on the internal combinational cell types
37 described in Chap.~\ref{chapter:celllib}. This means a cell with all constant
38 inputs is replaced with the constant value this cell drives. In some cases
39 this pass can also optimize cells with some constant inputs.
40
41 \begin{table}
42 \hfil
43 \begin{tabular}{cc|c}
44 A-Input & B-Input & Replacement \\
45 \hline
46 any & 0 & 0 \\
47 0 & any & 0 \\
48 1 & 1 & 1 \\
49 \hline
50 X/Z & X/Z & X \\
51 1 & X/Z & X \\
52 X/Z & 1 & X \\
53 \hline
54 any & X/Z & 0 \\
55 X/Z & any & 0 \\
56 \hline
57 $a$ & 1 & $a$ \\
58 1 & $b$ & $b$ \\
59 \end{tabular}
60 \caption{Const folding rules for {\tt\$\_AND\_} cells as used in {\tt opt\_expr}.}
61 \label{tab:opt_expr_and}
62 \end{table}
63
64 Table~\ref{tab:opt_expr_and} shows the replacement rules used for optimizing
65 an {\tt\$\_AND\_} gate. The first three rules implement the obvious const folding
66 rules. Note that `any' might include dynamic values calculated by other parts
67 of the circuit. The following three lines propagate undef (X) states.
68 These are the only three cases in which it is allowed to propagate an undef
69 according to Sec.~5.1.10 of IEEE Std. 1364-2005 \cite{Verilog2005}.
70
71 The next two lines assume the value 0 for undef states. These two rules are only
72 used if no other substitutions are possible in the current module. If other substitutions
73 are possible they are performed first, in the hope that the `any' will change to
74 an undef value or a 1 and therefore the output can be set to undef.
75
76 The last two lines simply replace an {\tt\$\_AND\_} gate with one constant-1
77 input with a buffer.
78
79 Besides this basic const folding the {\tt opt\_expr} pass can replace 1-bit wide
80 {\tt \$eq} and {\tt \$ne} cells with buffers or not-gates if one input is constant.
81
82 The {\tt opt\_expr} pass is very conservative regarding optimizing {\tt \$mux} cells,
83 as these cells are often used to model decision-trees and breaking these trees can
84 interfere with other optimizations.
85
86 \subsection{The opt\_muxtree pass}
87
88 This pass optimizes trees of multiplexer cells by analyzing the select inputs.
89 Consider the following simple example:
90
91 \begin{lstlisting}[numbers=left,frame=single,language=Verilog]
92 module uut(a, y);
93 input a;
94 output [1:0] y = a ? (a ? 1 : 2) : 3;
95 endmodule
96 \end{lstlisting}
97
98 The output can never be 2, as this would require \lstinline[language=Verilog];a;
99 to be 1 for the outer multiplexer and 0 for the inner multiplexer. The {\tt
100 opt\_muxtree} pass detects this contradiction and replaces the inner multiplexer
101 with a constant 1, yielding the logic for \lstinline[language=Verilog];y = a ? 1 : 3;.
102
103 \subsection{The opt\_reduce pass}
104
105 \begin{sloppypar}
106 This is a simple optimization pass that identifies and consolidates identical input
107 bits to {\tt \$reduce\_and} and {\tt \$reduce\_or} cells. It also sorts the input
108 bits to ease identification of shareable {\tt \$reduce\_and} and {\tt \$reduce\_or} cells
109 in other passes.
110 \end{sloppypar}
111
112 This pass also identifies and consolidates identical inputs to multiplexer cells. In this
113 case the new shared select bit is driven using a {\tt \$reduce\_or} cell that combines
114 the original select bits.
115
116 Lastly this pass consolidates trees of {\tt \$reduce\_and} cells and trees of
117 {\tt \$reduce\_or} cells to single large {\tt \$reduce\_and} or {\tt \$reduce\_or} cells.
118
119 These three simple optimizations are performed in a loop until a stable result is
120 produced.
121
122 \subsection{The opt\_rmdff pass}
123
124 This pass identifies single-bit d-type flip-flops ({\tt \$\_DFF\_*}, {\tt \$dff}, and {\tt
125 \$adff} cells) with a constant data input and replaces them with a constant driver.
126
127 \subsection{The opt\_clean pass}
128
129 This pass identifies unused signals and cells and removes them from the design. It also
130 creates an \B{unused\_bits} attribute on wires with unused bits. This attribute can be
131 used for debugging or by other optimization passes.
132
133 \subsection{The opt\_merge pass}
134
135 This pass performs trivial resource sharing. This means that this pass identifies cells
136 with identical inputs and replaces them with a single instance of the cell.
137
138 The option {\tt -nomux} can be used to disable resource sharing for multiplexer
139 cells ({\tt \$mux} and {\tt \$pmux}. This can be useful as
140 it prevents multiplexer trees to be merged, which might prevent {\tt opt\_muxtree}
141 to identify possible optimizations.
142
143 \section{FSM Extraction and Encoding}
144
145 The {\tt fsm} pass performs finite-state-machine (FSM) extraction and recoding. The {\tt fsm}
146 pass simply executes the following other passes:
147
148 \begin{itemize}
149 \item Identify and extract FSMs:
150 \begin{itemize}
151 \item {\tt fsm\_detect}
152 \item {\tt fsm\_extract}
153 \end{itemize}
154
155 \item Basic optimizations:
156 \begin{itemize}
157 \item {\tt fsm\_opt}
158 \item {\tt opt\_clean}
159 \item {\tt fsm\_opt}
160 \end{itemize}
161
162 \item Expanding to nearby gate-logic (if called with {\tt -expand}):
163 \begin{itemize}
164 \item {\tt fsm\_expand}
165 \item {\tt opt\_clean}
166 \item {\tt fsm\_opt}
167 \end{itemize}
168
169 \item Re-code FSM states (unless called with {\tt -norecode}):
170 \begin{itemize}
171 \item {\tt fsm\_recode}
172 \end{itemize}
173
174 \item Print information about FSMs:
175 \begin{itemize}
176 \item {\tt fsm\_info}
177 \end{itemize}
178
179 \item Export FSMs in KISS2 file format (if called with {\tt -export}):
180 \begin{itemize}
181 \item {\tt fsm\_export}
182 \end{itemize}
183
184 \item Map FSMs to RTL cells (unless called with {\tt -nomap}):
185 \begin{itemize}
186 \item {\tt fsm\_map}
187 \end{itemize}
188 \end{itemize}
189
190 The {\tt fsm\_detect} pass identifies FSM state registers and marks them using the
191 \B{fsm\_encoding}{\tt = "auto"} attribute. The {\tt fsm\_extract} extracts all
192 FSMs marked using the \B{fsm\_encoding} attribute (unless \B{fsm\_encoding} is
193 set to {\tt "none"}) and replaces the corresponding RTL cells with a {\tt \$fsm}
194 cell. All other {\tt fsm\_*} passes operate on these {\tt \$fsm} cells. The
195 {\tt fsm\_map} call finally replaces the {\tt \$fsm} cells with RTL cells.
196
197 Note that these optimizations operate on an RTL netlist. I.e.~the {\tt fsm} pass
198 should be executed after the {\tt proc} pass has transformed all
199 {\tt RTLIL::Process} objects to RTL cells.
200
201 The algorithms used for FSM detection and extraction are influenced by a more
202 general reported technique \cite{fsmextract}.
203
204 \subsection{FSM Detection}
205
206 The {\tt fsm\_detect} pass identifies FSM state registers. It sets the
207 \B{fsm\_encoding}{\tt = "auto"} attribute on any (multi-bit) wire that matches
208 the following description:
209
210 \begin{itemize}
211 \item Does not already have the \B{fsm\_encoding} attribute.
212 \item Is not an output of the containing module.
213 \item Is driven by single {\tt \$dff} or {\tt \$adff} cell.
214 \item The \B{D}-Input of this {\tt \$dff} or {\tt \$adff} cell is driven by a multiplexer
215 tree that only has constants or the old state value on its leaves.
216 \item The state value is only used in the said multiplexer tree or by simple relational
217 cells that compare the state value to a constant (usually {\tt \$eq} cells).
218 \end{itemize}
219
220 This heuristic has proven to work very well. It is possible to overwrite it by setting
221 \B{fsm\_encoding}{\tt = "auto"} on registers that should be considered FSM state registers
222 and setting \B{fsm\_encoding}{\tt = "none"} on registers that match the above criteria
223 but should not be considered FSM state registers.
224
225 Note however that marking state registers with \B{fsm\_encoding} that are not
226 suitable for FSM recoding can cause synthesis to fail or produce invalid
227 results.
228
229 \subsection{FSM Extraction}
230
231 The {\tt fsm\_extract} pass operates on all state signals marked with the
232 \B{fsm\_encoding} ({\tt != "none"}) attribute. For each state signal the following
233 information is determined:
234
235 \begin{itemize}
236 \item The state registers
237 \item The asynchronous reset state if the state registers use asynchronous reset
238 \item All states and the control input signals used in the state transition functions
239 \item The control output signals calculated from the state signals and control inputs
240 \item A table of all state transitions and corresponding control inputs- and outputs
241 \end{itemize}
242
243 The state registers (and asynchronous reset state, if applicable) is simply determined
244 by identifying the driver for the state signal.
245
246 From there the {\tt \$mux}-tree driving the state register inputs is
247 recursively traversed. All select inputs are control signals and the leaves of the
248 {\tt \$mux}-tree are the states. The algorithm fails if a non-constant leaf
249 that is not the state signal itself is found.
250
251 The list of control outputs is initialized with the bits from the state signal.
252 It is then extended by adding all values that are calculated by cells that
253 compare the state signal with a constant value.
254
255 In most cases this will cover all uses of the state register, thus rendering the
256 state encoding arbitrary. If however a design uses e.g.~a single bit of the state
257 value to drive a control output directly, this bit of the state signal will be
258 transformed to a control output of the same value.
259
260 Finally, a transition table for the FSM is generated. This is done by using the
261 {\tt ConstEval} C++ helper class (defined in {\tt kernel/consteval.h}) that can
262 be used to evaluate parts of the design. The {\tt ConstEval} class can be asked
263 to calculate a given set of result signals using a set of signal-value
264 assignments. It can also be passed a list of stop-signals that abort the {\tt
265 ConstEval} algorithm if the value of a stop-signal is needed in order to
266 calculate the result signals.
267
268 The {\tt fsm\_extract} pass uses the {\tt ConstEval} class in the following way
269 to create a transition table. For each state:
270
271 \begin{enumerate}
272 \item Create a {\tt ConstEval} object for the module containing the FSM
273 \item Add all control inputs to the list of stop signals
274 \item Set the state signal to the current state
275 \item Try to evaluate the next state and control output \label{enum:fsm_extract_cealg_try}
276 \item If step~\ref{enum:fsm_extract_cealg_try} was not successful:
277 \begin{itemize}
278 \item Recursively goto step~\ref{enum:fsm_extract_cealg_try} with the offending stop-signal set to 0.
279 \item Recursively goto step~\ref{enum:fsm_extract_cealg_try} with the offending stop-signal set to 1.
280 \end{itemize}
281 \item If step~\ref{enum:fsm_extract_cealg_try} was successful: Emit transition
282 \end{enumerate}
283
284 Finally a {\tt \$fsm} cell is created with the generated transition table and added to the
285 module. This new cell is connected to the control signals and the old drivers for the
286 control outputs are disconnected.
287
288 \subsection{FSM Optimization}
289
290 The {\tt fsm\_opt} pass performs basic optimizations on {\tt \$fsm} cells (not including state
291 recoding). The following optimizations are performed (in this order):
292
293 \begin{itemize}
294 \item Unused control outputs are removed from the {\tt \$fsm} cell. The attribute \B{unused\_bits}
295 (that is usually set by the {\tt opt\_clean} pass) is used to determine which control
296 outputs are unused.
297 \item Control inputs that are connected to the same driver are merged.
298 \item When a control input is driven by a control output, the control input is removed and the transition
299 table altered to give the same performance without the external feedback path.
300 \item Entries in the transition table that yield the same output and only
301 differ in the value of a single control input bit are merged and the different bit is removed
302 from the sensitivity list (turned into a don't-care bit).
303 \item Constant inputs are removed and the transition table is altered to give an unchanged behaviour.
304 \item Unused inputs are removed.
305 \end{itemize}
306
307 \subsection{FSM Recoding}
308
309 The {\tt fsm\_recode} pass assigns new bit pattern to the states. Usually this
310 also implies a change in the width of the state signal. At the moment of this
311 writing only one-hot encoding with all-zero for the reset state is supported.
312
313 The {\tt fsm\_recode} pass can also write a text file with the changes performed
314 by it that can be used when verifying designs synthesized by Yosys using Synopsys
315 Formality \citeweblink{Formality}.
316
317 \section{Logic Optimization}
318
319 Yosys can perform multi-level combinational logic optimization on gate-level netlists using the
320 external program ABC \citeweblink{ABC}. The {\tt abc} pass extracts the combinational gate-level
321 parts of the design, passes it through ABC, and re-integrates the results. The {\tt abc} pass
322 can also be used to perform other operations using ABC, such as technology mapping (see
323 Sec.~\ref{sec:techmap_extern} for details).
324