speed up ==, hash, <, >, <=, and >= for plain_data
[nmutil.git] / src / nmutil / singlepipe.py
1 # SPDX-License-Identifier: LGPL-3-or-later
2 """ Pipeline API. For multi-input and multi-output variants, see multipipe.
3
4 This work is funded through NLnet under Grant 2019-02-012
5
6 License: LGPLv3+
7
8
9 Associated development bugs:
10 * http://bugs.libre-riscv.org/show_bug.cgi?id=148
11 * http://bugs.libre-riscv.org/show_bug.cgi?id=64
12 * http://bugs.libre-riscv.org/show_bug.cgi?id=57
13
14 Important: see Stage API (stageapi.py) and IO Control API
15 (iocontrol.py) in combination with below. This module
16 "combines" the Stage API with the IO Control API to create
17 the Pipeline API.
18
19 The one critically important key difference between StageAPI and
20 PipelineAPI:
21
22 * StageAPI: combinatorial (NO REGISTERS / LATCHES PERMITTED)
23 * PipelineAPI: synchronous registers / latches get added here
24
25 RecordBasedStage:
26 ----------------
27
28 A convenience class that takes an input shape, output shape, a
29 "processing" function and an optional "setup" function. Honestly
30 though, there's not much more effort to just... create a class
31 that returns a couple of Records (see ExampleAddRecordStage in
32 examples).
33
34 PassThroughStage:
35 ----------------
36
37 A convenience class that takes a single function as a parameter,
38 that is chain-called to create the exact same input and output spec.
39 It has a process() function that simply returns its input.
40
41 Instances of this class are completely redundant if handed to
42 StageChain, however when passed to UnbufferedPipeline they
43 can be used to introduce a single clock delay.
44
45 ControlBase:
46 -----------
47
48 The base class for pipelines. Contains previous and next ready/valid/data.
49 Also has an extremely useful "connect" function that can be used to
50 connect a chain of pipelines and present the exact same prev/next
51 ready/valid/data API.
52
53 Note: pipelines basically do not become pipelines as such until
54 handed to a derivative of ControlBase. ControlBase itself is *not*
55 strictly considered a pipeline class. Wishbone and AXI4 (master or
56 slave) could be derived from ControlBase, for example.
57 UnbufferedPipeline:
58 ------------------
59
60 A simple stalling clock-synchronised pipeline that has no buffering
61 (unlike BufferedHandshake). Data flows on *every* clock cycle when
62 the conditions are right (this is nominally when the input is valid
63 and the output is ready).
64
65 A stall anywhere along the line will result in a stall back-propagating
66 down the entire chain. The BufferedHandshake by contrast will buffer
67 incoming data, allowing previous stages one clock cycle's grace before
68 also having to stall.
69
70 An advantage of the UnbufferedPipeline over the Buffered one is
71 that the amount of logic needed (number of gates) is greatly
72 reduced (no second set of buffers basically)
73
74 The disadvantage of the UnbufferedPipeline is that the valid/ready
75 logic, if chained together, is *combinatorial*, resulting in
76 progressively larger gate delay.
77
78 PassThroughHandshake:
79 ------------------
80
81 A Control class that introduces a single clock delay, passing its
82 data through unaltered. Unlike RegisterPipeline (which relies
83 on UnbufferedPipeline and PassThroughStage) it handles ready/valid
84 itself.
85
86 RegisterPipeline:
87 ----------------
88
89 A convenience class that, because UnbufferedPipeline introduces a single
90 clock delay, when its stage is a PassThroughStage, it results in a Pipeline
91 stage that, duh, delays its (unmodified) input by one clock cycle.
92
93 BufferedHandshake:
94 ----------------
95
96 nmigen implementation of buffered pipeline stage, based on zipcpu:
97 https://zipcpu.com/blog/2017/08/14/strategies-for-pipelining.html
98
99 this module requires quite a bit of thought to understand how it works
100 (and why it is needed in the first place). reading the above is
101 *strongly* recommended.
102
103 unlike john dawson's IEEE754 FPU STB/ACK signalling, which requires
104 the STB / ACK signals to raise and lower (on separate clocks) before
105 data may proceeed (thus only allowing one piece of data to proceed
106 on *ALTERNATE* cycles), the signalling here is a true pipeline
107 where data will flow on *every* clock when the conditions are right.
108
109 input acceptance conditions are when:
110 * incoming previous-stage strobe (p.i_valid) is HIGH
111 * outgoing previous-stage ready (p.o_ready) is LOW
112
113 output transmission conditions are when:
114 * outgoing next-stage strobe (n.o_valid) is HIGH
115 * outgoing next-stage ready (n.i_ready) is LOW
116
117 the tricky bit is when the input has valid data and the output is not
118 ready to accept it. if it wasn't for the clock synchronisation, it
119 would be possible to tell the input "hey don't send that data, we're
120 not ready". unfortunately, it's not possible to "change the past":
121 the previous stage *has no choice* but to pass on its data.
122
123 therefore, the incoming data *must* be accepted - and stored: that
124 is the responsibility / contract that this stage *must* accept.
125 on the same clock, it's possible to tell the input that it must
126 not send any more data. this is the "stall" condition.
127
128 we now effectively have *two* possible pieces of data to "choose" from:
129 the buffered data, and the incoming data. the decision as to which
130 to process and output is based on whether we are in "stall" or not.
131 i.e. when the next stage is no longer ready, the output comes from
132 the buffer if a stall had previously occurred, otherwise it comes
133 direct from processing the input.
134
135 this allows us to respect a synchronous "travelling STB" with what
136 dan calls a "buffered handshake".
137
138 it's quite a complex state machine!
139
140 SimpleHandshake
141 ---------------
142
143 Synchronised pipeline, Based on:
144 https://github.com/ZipCPU/dbgbus/blob/master/hexbus/rtl/hbdeword.v
145 """
146
147 from nmigen import Signal, Mux, Module, Elaboratable, Const
148 from nmigen.cli import verilog, rtlil
149 from nmigen.hdl.rec import Record
150
151 from nmutil.queue import Queue
152 import inspect
153
154 from nmutil.iocontrol import (PrevControl, NextControl, Object, RecordObject)
155 from nmutil.stageapi import (_spec, StageCls, Stage, StageChain, StageHelper)
156 from nmutil import nmoperator
157
158
159 class RecordBasedStage(Stage):
160 """ convenience class which provides a Records-based layout.
161 honestly it's a lot easier just to create a direct Records-based
162 class (see ExampleAddRecordStage)
163 """
164
165 def __init__(self, in_shape, out_shape, processfn, setupfn=None):
166 self.in_shape = in_shape
167 self.out_shape = out_shape
168 self.__process = processfn
169 self.__setup = setupfn
170
171 def ispec(self): return Record(self.in_shape)
172 def ospec(self): return Record(self.out_shape)
173 def process(seif, i): return self.__process(i)
174 def setup(seif, m, i): return self.__setup(m, i)
175
176
177 class PassThroughStage(StageCls):
178 """ a pass-through stage with its input data spec identical to its output,
179 and "passes through" its data from input to output (does nothing).
180
181 use this basically to explicitly make any data spec Stage-compliant.
182 (many APIs would potentially use a static "wrap" method in e.g.
183 StageCls to achieve a similar effect)
184 """
185
186 def __init__(self, iospecfn): self.iospecfn = iospecfn
187 def ispec(self): return self.iospecfn()
188 def ospec(self): return self.iospecfn()
189
190
191 class ControlBase(StageHelper, Elaboratable):
192 """ Common functions for Pipeline API. Note: a "pipeline stage" only
193 exists (conceptually) when a ControlBase derivative is handed
194 a Stage (combinatorial block)
195
196 NOTE: ControlBase derives from StageHelper, making it accidentally
197 compliant with the Stage API. Using those functions directly
198 *BYPASSES* a ControlBase instance ready/valid signalling, which
199 clearly should not be done without a really, really good reason.
200 """
201
202 def __init__(self, stage=None, in_multi=None, stage_ctl=False, maskwid=0):
203 """ Base class containing ready/valid/data to previous and next stages
204
205 * p: contains ready/valid to the previous stage
206 * n: contains ready/valid to the next stage
207
208 Except when calling Controlbase.connect(), user must also:
209 * add i_data member to PrevControl (p) and
210 * add o_data member to NextControl (n)
211 Calling ControlBase._new_data is a good way to do that.
212 """
213 print("ControlBase", self, stage, in_multi, stage_ctl)
214 StageHelper.__init__(self, stage)
215
216 # set up input and output IO ACK (prev/next ready/valid)
217 self.p = PrevControl(in_multi, stage_ctl, maskwid=maskwid)
218 self.n = NextControl(stage_ctl, maskwid=maskwid)
219
220 # set up the input and output data
221 if stage is not None:
222 self._new_data("data")
223
224 def _new_data(self, name):
225 """ allocates new i_data and o_data
226 """
227 self.p.i_data, self.n.o_data = self.new_specs(name)
228
229 @property
230 def data_r(self):
231 return self.process(self.p.i_data)
232
233 def connect_to_next(self, nxt):
234 """ helper function to connect to the next stage data/valid/ready.
235 """
236 return self.n.connect_to_next(nxt.p)
237
238 def _connect_in(self, prev):
239 """ internal helper function to connect stage to an input source.
240 do not use to connect stage-to-stage!
241 """
242 return self.p._connect_in(prev.p)
243
244 def _connect_out(self, nxt):
245 """ internal helper function to connect stage to an output source.
246 do not use to connect stage-to-stage!
247 """
248 return self.n._connect_out(nxt.n)
249
250 def connect(self, pipechain):
251 """ connects a chain (list) of Pipeline instances together and
252 links them to this ControlBase instance:
253
254 in <----> self <---> out
255 | ^
256 v |
257 [pipe1, pipe2, pipe3, pipe4]
258 | ^ | ^ | ^
259 v | v | v |
260 out---in out--in out---in
261
262 Also takes care of allocating i_data/o_data, by looking up
263 the data spec for each end of the pipechain. i.e It is NOT
264 necessary to allocate self.p.i_data or self.n.o_data manually:
265 this is handled AUTOMATICALLY, here.
266
267 Basically this function is the direct equivalent of StageChain,
268 except that unlike StageChain, the Pipeline logic is followed.
269
270 Just as StageChain presents an object that conforms to the
271 Stage API from a list of objects that also conform to the
272 Stage API, an object that calls this Pipeline connect function
273 has the exact same pipeline API as the list of pipline objects
274 it is called with.
275
276 Thus it becomes possible to build up larger chains recursively.
277 More complex chains (multi-input, multi-output) will have to be
278 done manually.
279
280 Argument:
281
282 * :pipechain: - a sequence of ControlBase-derived classes
283 (must be one or more in length)
284
285 Returns:
286
287 * a list of eq assignments that will need to be added in
288 an elaborate() to m.d.comb
289 """
290 assert len(pipechain) > 0, "pipechain must be non-zero length"
291 assert self.stage is None, "do not use connect with a stage"
292 eqs = [] # collated list of assignment statements
293
294 # connect inter-chain
295 for i in range(len(pipechain)-1):
296 pipe1 = pipechain[i] # earlier
297 pipe2 = pipechain[i+1] # later (by 1)
298 eqs += pipe1.connect_to_next(pipe2) # earlier n to later p
299
300 # connect front and back of chain to ourselves
301 front = pipechain[0] # first in chain
302 end = pipechain[-1] # last in chain
303 self.set_specs(front, end) # sets up ispec/ospec functions
304 self._new_data("chain") # NOTE: REPLACES existing data
305 eqs += front._connect_in(self) # front p to our p
306 eqs += end._connect_out(self) # end n to our n
307
308 return eqs
309
310 def set_input(self, i):
311 """ helper function to set the input data (used in unit tests)
312 """
313 return nmoperator.eq(self.p.i_data, i)
314
315 def __iter__(self):
316 yield from self.p # yields ready/valid/data (data also gets yielded)
317 yield from self.n # ditto
318
319 def ports(self):
320 return list(self)
321
322 def elaborate(self, platform):
323 """ handles case where stage has dynamic ready/valid functions
324 """
325 m = Module()
326 m.submodules.p = self.p
327 m.submodules.n = self.n
328
329 self.setup(m, self.p.i_data)
330
331 if not self.p.stage_ctl:
332 return m
333
334 # intercept the previous (outgoing) "ready", combine with stage ready
335 m.d.comb += self.p.s_o_ready.eq(self.p._o_ready & self.stage.d_ready)
336
337 # intercept the next (incoming) "ready" and combine it with data valid
338 sdv = self.stage.d_valid(self.n.i_ready)
339 m.d.comb += self.n.d_valid.eq(self.n.i_ready & sdv)
340
341 return m
342
343
344 class BufferedHandshake(ControlBase):
345 """ buffered pipeline stage. data and strobe signals travel in sync.
346 if ever the input is ready and the output is not, processed data
347 is shunted in a temporary register.
348
349 Argument: stage. see Stage API above
350
351 stage-1 p.i_valid >>in stage n.o_valid out>> stage+1
352 stage-1 p.o_ready <<out stage n.i_ready <<in stage+1
353 stage-1 p.i_data >>in stage n.o_data out>> stage+1
354 | |
355 process --->----^
356 | |
357 +-- r_data ->-+
358
359 input data p.i_data is read (only), is processed and goes into an
360 intermediate result store [process()]. this is updated combinatorially.
361
362 in a non-stall condition, the intermediate result will go into the
363 output (update_output). however if ever there is a stall, it goes
364 into r_data instead [update_buffer()].
365
366 when the non-stall condition is released, r_data is the first
367 to be transferred to the output [flush_buffer()], and the stall
368 condition cleared.
369
370 on the next cycle (as long as stall is not raised again) the
371 input may begin to be processed and transferred directly to output.
372 """
373
374 def elaborate(self, platform):
375 self.m = ControlBase.elaborate(self, platform)
376
377 result = _spec(self.stage.ospec, "r_tmp")
378 r_data = _spec(self.stage.ospec, "r_data")
379
380 # establish some combinatorial temporaries
381 o_n_validn = Signal(reset_less=True)
382 n_i_ready = Signal(reset_less=True, name="n_i_rdy_data")
383 nir_por = Signal(reset_less=True)
384 nir_por_n = Signal(reset_less=True)
385 p_i_valid = Signal(reset_less=True)
386 nir_novn = Signal(reset_less=True)
387 nirn_novn = Signal(reset_less=True)
388 por_pivn = Signal(reset_less=True)
389 npnn = Signal(reset_less=True)
390 self.m.d.comb += [p_i_valid.eq(self.p.i_valid_test),
391 o_n_validn.eq(~self.n.o_valid),
392 n_i_ready.eq(self.n.i_ready_test),
393 nir_por.eq(n_i_ready & self.p._o_ready),
394 nir_por_n.eq(n_i_ready & ~self.p._o_ready),
395 nir_novn.eq(n_i_ready | o_n_validn),
396 nirn_novn.eq(~n_i_ready & o_n_validn),
397 npnn.eq(nir_por | nirn_novn),
398 por_pivn.eq(self.p._o_ready & ~p_i_valid)
399 ]
400
401 # store result of processing in combinatorial temporary
402 self.m.d.comb += nmoperator.eq(result, self.data_r)
403
404 # if not in stall condition, update the temporary register
405 with self.m.If(self.p.o_ready): # not stalled
406 self.m.d.sync += nmoperator.eq(r_data, result) # update buffer
407
408 # data pass-through conditions
409 with self.m.If(npnn):
410 # XXX TBD, does nothing right now
411 o_data = self._postprocess(result)
412 self.m.d.sync += [self.n.o_valid.eq(p_i_valid), # valid if p_valid
413 # update out
414 nmoperator.eq(self.n.o_data, o_data),
415 ]
416 # buffer flush conditions (NOTE: can override data passthru conditions)
417 with self.m.If(nir_por_n): # not stalled
418 # Flush the [already processed] buffer to the output port.
419 # XXX TBD, does nothing right now
420 o_data = self._postprocess(r_data)
421 self.m.d.sync += [self.n.o_valid.eq(1), # reg empty
422 nmoperator.eq(self.n.o_data, o_data), # flush
423 ]
424 # output ready conditions
425 self.m.d.sync += self.p._o_ready.eq(nir_novn | por_pivn)
426
427 return self.m
428
429
430 class MaskNoDelayCancellable(ControlBase):
431 """ Mask-activated Cancellable pipeline (that does not respect "ready")
432
433 Based on (identical behaviour to) SimpleHandshake.
434 TODO: decide whether to merge *into* SimpleHandshake.
435
436 Argument: stage. see Stage API above
437
438 stage-1 p.i_valid >>in stage n.o_valid out>> stage+1
439 stage-1 p.o_ready <<out stage n.i_ready <<in stage+1
440 stage-1 p.i_data >>in stage n.o_data out>> stage+1
441 | |
442 +--process->--^
443 """
444
445 def __init__(self, stage, maskwid, in_multi=None, stage_ctl=False):
446 ControlBase.__init__(self, stage, in_multi, stage_ctl, maskwid)
447
448 def elaborate(self, platform):
449 self.m = m = ControlBase.elaborate(self, platform)
450
451 # store result of processing in combinatorial temporary
452 result = _spec(self.stage.ospec, "r_tmp")
453 m.d.comb += nmoperator.eq(result, self.data_r)
454
455 # establish if the data should be passed on. cancellation is
456 # a global signal.
457 # XXX EXCEPTIONAL CIRCUMSTANCES: inspection of the data payload
458 # is NOT "normal" for the Stage API.
459 p_i_valid = Signal(reset_less=True)
460 #print ("self.p.i_data", self.p.i_data)
461 maskedout = Signal(len(self.p.mask_i), reset_less=True)
462 m.d.comb += maskedout.eq(self.p.mask_i & ~self.p.stop_i)
463 m.d.comb += p_i_valid.eq(maskedout.bool())
464
465 # if idmask nonzero, mask gets passed on (and register set).
466 # register is left as-is if idmask is zero, but out-mask is set to zero
467 # note however: only the *uncancelled* mask bits get passed on
468 m.d.sync += self.n.o_valid.eq(p_i_valid)
469 m.d.sync += self.n.mask_o.eq(Mux(p_i_valid, maskedout, 0))
470 with m.If(p_i_valid):
471 # XXX TBD, does nothing right now
472 o_data = self._postprocess(result)
473 m.d.sync += nmoperator.eq(self.n.o_data, o_data) # update output
474
475 # output valid if
476 # input always "ready"
477 #m.d.comb += self.p._o_ready.eq(self.n.i_ready_test)
478 m.d.comb += self.p._o_ready.eq(Const(1))
479
480 # always pass on stop (as combinatorial: single signal)
481 m.d.comb += self.n.stop_o.eq(self.p.stop_i)
482
483 return self.m
484
485
486 class MaskCancellable(ControlBase):
487 """ Mask-activated Cancellable pipeline
488
489 Arguments:
490
491 * stage. see Stage API above
492 * maskwid - sets up cancellation capability (mask and stop).
493 * in_multi
494 * stage_ctl
495 * dynamic - allows switching from sync to combinatorial (passthrough)
496 USE WITH CARE. will need the entire pipe to be quiescent
497 before switching, otherwise data WILL be destroyed.
498
499 stage-1 p.i_valid >>in stage n.o_valid out>> stage+1
500 stage-1 p.o_ready <<out stage n.i_ready <<in stage+1
501 stage-1 p.i_data >>in stage n.o_data out>> stage+1
502 | |
503 +--process->--^
504 """
505
506 def __init__(self, stage, maskwid, in_multi=None, stage_ctl=False,
507 dynamic=False):
508 ControlBase.__init__(self, stage, in_multi, stage_ctl, maskwid)
509 self.dynamic = dynamic
510 if dynamic:
511 self.latchmode = Signal()
512 else:
513 self.latchmode = Const(1)
514
515 def elaborate(self, platform):
516 self.m = m = ControlBase.elaborate(self, platform)
517
518 mask_r = Signal(len(self.p.mask_i), reset_less=True)
519 data_r = _spec(self.stage.ospec, "data_r")
520 m.d.comb += nmoperator.eq(data_r, self._postprocess(self.data_r))
521
522 with m.If(self.latchmode):
523 r_busy = Signal()
524 r_latch = _spec(self.stage.ospec, "r_latch")
525
526 # establish if the data should be passed on. cancellation is
527 # a global signal.
528 p_i_valid = Signal(reset_less=True)
529 #print ("self.p.i_data", self.p.i_data)
530 maskedout = Signal(len(self.p.mask_i), reset_less=True)
531 m.d.comb += maskedout.eq(self.p.mask_i & ~self.p.stop_i)
532
533 # establish some combinatorial temporaries
534 n_i_ready = Signal(reset_less=True, name="n_i_rdy_data")
535 p_i_valid_p_o_ready = Signal(reset_less=True)
536 m.d.comb += [p_i_valid.eq(self.p.i_valid_test & maskedout.bool()),
537 n_i_ready.eq(self.n.i_ready_test),
538 p_i_valid_p_o_ready.eq(p_i_valid & self.p.o_ready),
539 ]
540
541 # if idmask nonzero, mask gets passed on (and register set).
542 # register is left as-is if idmask is zero, but out-mask is set to
543 # zero
544 # note however: only the *uncancelled* mask bits get passed on
545 m.d.sync += mask_r.eq(Mux(p_i_valid, maskedout, 0))
546 m.d.comb += self.n.mask_o.eq(mask_r)
547
548 # always pass on stop (as combinatorial: single signal)
549 m.d.comb += self.n.stop_o.eq(self.p.stop_i)
550
551 stor = Signal(reset_less=True)
552 m.d.comb += stor.eq(p_i_valid_p_o_ready | n_i_ready)
553 with m.If(stor):
554 # store result of processing in combinatorial temporary
555 m.d.sync += nmoperator.eq(r_latch, data_r)
556
557 # previous valid and ready
558 with m.If(p_i_valid_p_o_ready):
559 m.d.sync += r_busy.eq(1) # output valid
560 # previous invalid or not ready, however next is accepting
561 with m.Elif(n_i_ready):
562 m.d.sync += r_busy.eq(0) # ...so set output invalid
563
564 # output set combinatorially from latch
565 m.d.comb += nmoperator.eq(self.n.o_data, r_latch)
566
567 m.d.comb += self.n.o_valid.eq(r_busy)
568 # if next is ready, so is previous
569 m.d.comb += self.p._o_ready.eq(n_i_ready)
570
571 with m.Else():
572 # pass everything straight through. p connected to n: data,
573 # valid, mask, everything. this is "effectively" just a
574 # StageChain: MaskCancellable is doing "nothing" except
575 # combinatorially passing everything through
576 # (except now it's *dynamically selectable* whether to do that)
577 m.d.comb += self.n.o_valid.eq(self.p.i_valid_test)
578 m.d.comb += self.p._o_ready.eq(self.n.i_ready_test)
579 m.d.comb += self.n.stop_o.eq(self.p.stop_i)
580 m.d.comb += self.n.mask_o.eq(self.p.mask_i)
581 m.d.comb += nmoperator.eq(self.n.o_data, data_r)
582
583 return self.m
584
585
586 class SimpleHandshake(ControlBase):
587 """ simple handshake control. data and strobe signals travel in sync.
588 implements the protocol used by Wishbone and AXI4.
589
590 Argument: stage. see Stage API above
591
592 stage-1 p.i_valid >>in stage n.o_valid out>> stage+1
593 stage-1 p.o_ready <<out stage n.i_ready <<in stage+1
594 stage-1 p.i_data >>in stage n.o_data out>> stage+1
595 | |
596 +--process->--^
597 Truth Table
598
599 Inputs Temporary Output Data
600 ------- ---------- ----- ----
601 P P N N PiV& ~NiR& N P
602 i o i o PoR NoV o o
603 V R R V V R
604
605 ------- - - - -
606 0 0 0 0 0 0 >0 0 reg
607 0 0 0 1 0 1 >1 0 reg
608 0 0 1 0 0 0 0 1 process(i_data)
609 0 0 1 1 0 0 0 1 process(i_data)
610 ------- - - - -
611 0 1 0 0 0 0 >0 0 reg
612 0 1 0 1 0 1 >1 0 reg
613 0 1 1 0 0 0 0 1 process(i_data)
614 0 1 1 1 0 0 0 1 process(i_data)
615 ------- - - - -
616 1 0 0 0 0 0 >0 0 reg
617 1 0 0 1 0 1 >1 0 reg
618 1 0 1 0 0 0 0 1 process(i_data)
619 1 0 1 1 0 0 0 1 process(i_data)
620 ------- - - - -
621 1 1 0 0 1 0 1 0 process(i_data)
622 1 1 0 1 1 1 1 0 process(i_data)
623 1 1 1 0 1 0 1 1 process(i_data)
624 1 1 1 1 1 0 1 1 process(i_data)
625 ------- - - - -
626 """
627
628 def elaborate(self, platform):
629 self.m = m = ControlBase.elaborate(self, platform)
630
631 r_busy = Signal()
632 result = _spec(self.stage.ospec, "r_tmp")
633
634 # establish some combinatorial temporaries
635 n_i_ready = Signal(reset_less=True, name="n_i_rdy_data")
636 p_i_valid_p_o_ready = Signal(reset_less=True)
637 p_i_valid = Signal(reset_less=True)
638 m.d.comb += [p_i_valid.eq(self.p.i_valid_test),
639 n_i_ready.eq(self.n.i_ready_test),
640 p_i_valid_p_o_ready.eq(p_i_valid & self.p.o_ready),
641 ]
642
643 # store result of processing in combinatorial temporary
644 m.d.comb += nmoperator.eq(result, self.data_r)
645
646 # previous valid and ready
647 with m.If(p_i_valid_p_o_ready):
648 # XXX TBD, does nothing right now
649 o_data = self._postprocess(result)
650 m.d.sync += [r_busy.eq(1), # output valid
651 nmoperator.eq(self.n.o_data, o_data), # update output
652 ]
653 # previous invalid or not ready, however next is accepting
654 with m.Elif(n_i_ready):
655 # XXX TBD, does nothing right now
656 o_data = self._postprocess(result)
657 m.d.sync += [nmoperator.eq(self.n.o_data, o_data)]
658 # TODO: could still send data here (if there was any)
659 # m.d.sync += self.n.o_valid.eq(0) # ...so set output invalid
660 m.d.sync += r_busy.eq(0) # ...so set output invalid
661
662 m.d.comb += self.n.o_valid.eq(r_busy)
663 # if next is ready, so is previous
664 m.d.comb += self.p._o_ready.eq(n_i_ready)
665
666 return self.m
667
668
669 class UnbufferedPipeline(ControlBase):
670 """ A simple pipeline stage with single-clock synchronisation
671 and two-way valid/ready synchronised signalling.
672
673 Note that a stall in one stage will result in the entire pipeline
674 chain stalling.
675
676 Also that unlike BufferedHandshake, the valid/ready signalling does NOT
677 travel synchronously with the data: the valid/ready signalling
678 combines in a *combinatorial* fashion. Therefore, a long pipeline
679 chain will lengthen propagation delays.
680
681 Argument: stage. see Stage API, above
682
683 stage-1 p.i_valid >>in stage n.o_valid out>> stage+1
684 stage-1 p.o_ready <<out stage n.i_ready <<in stage+1
685 stage-1 p.i_data >>in stage n.o_data out>> stage+1
686 | |
687 r_data result
688 | |
689 +--process ->-+
690
691 Attributes:
692 -----------
693 p.i_data : StageInput, shaped according to ispec
694 The pipeline input
695 p.o_data : StageOutput, shaped according to ospec
696 The pipeline output
697 r_data : input_shape according to ispec
698 A temporary (buffered) copy of a prior (valid) input.
699 This is HELD if the output is not ready. It is updated
700 SYNCHRONOUSLY.
701 result: output_shape according to ospec
702 The output of the combinatorial logic. it is updated
703 COMBINATORIALLY (no clock dependence).
704
705 Truth Table
706
707 Inputs Temp Output Data
708 ------- - ----- ----
709 P P N N ~NiR& N P
710 i o i o NoV o o
711 V R R V V R
712
713 ------- - - -
714 0 0 0 0 0 0 1 reg
715 0 0 0 1 1 1 0 reg
716 0 0 1 0 0 0 1 reg
717 0 0 1 1 0 0 1 reg
718 ------- - - -
719 0 1 0 0 0 0 1 reg
720 0 1 0 1 1 1 0 reg
721 0 1 1 0 0 0 1 reg
722 0 1 1 1 0 0 1 reg
723 ------- - - -
724 1 0 0 0 0 1 1 reg
725 1 0 0 1 1 1 0 reg
726 1 0 1 0 0 1 1 reg
727 1 0 1 1 0 1 1 reg
728 ------- - - -
729 1 1 0 0 0 1 1 process(i_data)
730 1 1 0 1 1 1 0 process(i_data)
731 1 1 1 0 0 1 1 process(i_data)
732 1 1 1 1 0 1 1 process(i_data)
733 ------- - - -
734
735 Note: PoR is *NOT* involved in the above decision-making.
736 """
737
738 def elaborate(self, platform):
739 self.m = m = ControlBase.elaborate(self, platform)
740
741 data_valid = Signal() # is data valid or not
742 r_data = _spec(self.stage.ospec, "r_tmp") # output type
743
744 # some temporaries
745 p_i_valid = Signal(reset_less=True)
746 pv = Signal(reset_less=True)
747 buf_full = Signal(reset_less=True)
748 m.d.comb += p_i_valid.eq(self.p.i_valid_test)
749 m.d.comb += pv.eq(self.p.i_valid & self.p.o_ready)
750 m.d.comb += buf_full.eq(~self.n.i_ready_test & data_valid)
751
752 m.d.comb += self.n.o_valid.eq(data_valid)
753 m.d.comb += self.p._o_ready.eq(~data_valid | self.n.i_ready_test)
754 m.d.sync += data_valid.eq(p_i_valid | buf_full)
755
756 with m.If(pv):
757 m.d.sync += nmoperator.eq(r_data, self.data_r)
758 o_data = self._postprocess(r_data) # XXX TBD, does nothing right now
759 m.d.comb += nmoperator.eq(self.n.o_data, o_data)
760
761 return self.m
762
763
764 class UnbufferedPipeline2(ControlBase):
765 """ A simple pipeline stage with single-clock synchronisation
766 and two-way valid/ready synchronised signalling.
767
768 Note that a stall in one stage will result in the entire pipeline
769 chain stalling.
770
771 Also that unlike BufferedHandshake, the valid/ready signalling does NOT
772 travel synchronously with the data: the valid/ready signalling
773 combines in a *combinatorial* fashion. Therefore, a long pipeline
774 chain will lengthen propagation delays.
775
776 Argument: stage. see Stage API, above
777
778 stage-1 p.i_valid >>in stage n.o_valid out>> stage+1
779 stage-1 p.o_ready <<out stage n.i_ready <<in stage+1
780 stage-1 p.i_data >>in stage n.o_data out>> stage+1
781 | | |
782 +- process-> buf <-+
783 Attributes:
784 -----------
785 p.i_data : StageInput, shaped according to ispec
786 The pipeline input
787 p.o_data : StageOutput, shaped according to ospec
788 The pipeline output
789 buf : output_shape according to ospec
790 A temporary (buffered) copy of a valid output
791 This is HELD if the output is not ready. It is updated
792 SYNCHRONOUSLY.
793
794 Inputs Temp Output Data
795 ------- - -----
796 P P N N ~NiR& N P (buf_full)
797 i o i o NoV o o
798 V R R V V R
799
800 ------- - - -
801 0 0 0 0 0 0 1 process(i_data)
802 0 0 0 1 1 1 0 reg (odata, unchanged)
803 0 0 1 0 0 0 1 process(i_data)
804 0 0 1 1 0 0 1 process(i_data)
805 ------- - - -
806 0 1 0 0 0 0 1 process(i_data)
807 0 1 0 1 1 1 0 reg (odata, unchanged)
808 0 1 1 0 0 0 1 process(i_data)
809 0 1 1 1 0 0 1 process(i_data)
810 ------- - - -
811 1 0 0 0 0 1 1 process(i_data)
812 1 0 0 1 1 1 0 reg (odata, unchanged)
813 1 0 1 0 0 1 1 process(i_data)
814 1 0 1 1 0 1 1 process(i_data)
815 ------- - - -
816 1 1 0 0 0 1 1 process(i_data)
817 1 1 0 1 1 1 0 reg (odata, unchanged)
818 1 1 1 0 0 1 1 process(i_data)
819 1 1 1 1 0 1 1 process(i_data)
820 ------- - - -
821
822 Note: PoR is *NOT* involved in the above decision-making.
823 """
824
825 def elaborate(self, platform):
826 self.m = m = ControlBase.elaborate(self, platform)
827
828 buf_full = Signal() # is data valid or not
829 buf = _spec(self.stage.ospec, "r_tmp") # output type
830
831 # some temporaries
832 p_i_valid = Signal(reset_less=True)
833 m.d.comb += p_i_valid.eq(self.p.i_valid_test)
834
835 m.d.comb += self.n.o_valid.eq(buf_full | p_i_valid)
836 m.d.comb += self.p._o_ready.eq(~buf_full)
837 m.d.sync += buf_full.eq(~self.n.i_ready_test & self.n.o_valid)
838
839 o_data = Mux(buf_full, buf, self.data_r)
840 o_data = self._postprocess(o_data) # XXX TBD, does nothing right now
841 m.d.comb += nmoperator.eq(self.n.o_data, o_data)
842 m.d.sync += nmoperator.eq(buf, self.n.o_data)
843
844 return self.m
845
846
847 class PassThroughHandshake(ControlBase):
848 """ A control block that delays by one clock cycle.
849
850 Inputs Temporary Output Data
851 ------- ------------------ ----- ----
852 P P N N PiV& PiV| NiR| pvr N P (pvr)
853 i o i o PoR ~PoR ~NoV o o
854 V R R V V R
855
856 ------- - - - - - -
857 0 0 0 0 0 1 1 0 1 1 odata (unchanged)
858 0 0 0 1 0 1 0 0 1 0 odata (unchanged)
859 0 0 1 0 0 1 1 0 1 1 odata (unchanged)
860 0 0 1 1 0 1 1 0 1 1 odata (unchanged)
861 ------- - - - - - -
862 0 1 0 0 0 0 1 0 0 1 odata (unchanged)
863 0 1 0 1 0 0 0 0 0 0 odata (unchanged)
864 0 1 1 0 0 0 1 0 0 1 odata (unchanged)
865 0 1 1 1 0 0 1 0 0 1 odata (unchanged)
866 ------- - - - - - -
867 1 0 0 0 0 1 1 1 1 1 process(in)
868 1 0 0 1 0 1 0 0 1 0 odata (unchanged)
869 1 0 1 0 0 1 1 1 1 1 process(in)
870 1 0 1 1 0 1 1 1 1 1 process(in)
871 ------- - - - - - -
872 1 1 0 0 1 1 1 1 1 1 process(in)
873 1 1 0 1 1 1 0 0 1 0 odata (unchanged)
874 1 1 1 0 1 1 1 1 1 1 process(in)
875 1 1 1 1 1 1 1 1 1 1 process(in)
876 ------- - - - - - -
877
878 """
879
880 def elaborate(self, platform):
881 self.m = m = ControlBase.elaborate(self, platform)
882
883 r_data = _spec(self.stage.ospec, "r_tmp") # output type
884
885 # temporaries
886 p_i_valid = Signal(reset_less=True)
887 pvr = Signal(reset_less=True)
888 m.d.comb += p_i_valid.eq(self.p.i_valid_test)
889 m.d.comb += pvr.eq(p_i_valid & self.p.o_ready)
890
891 m.d.comb += self.p.o_ready.eq(~self.n.o_valid | self.n.i_ready_test)
892 m.d.sync += self.n.o_valid.eq(p_i_valid | ~self.p.o_ready)
893
894 odata = Mux(pvr, self.data_r, r_data)
895 m.d.sync += nmoperator.eq(r_data, odata)
896 r_data = self._postprocess(r_data) # XXX TBD, does nothing right now
897 m.d.comb += nmoperator.eq(self.n.o_data, r_data)
898
899 return m
900
901
902 class RegisterPipeline(UnbufferedPipeline):
903 """ A pipeline stage that delays by one clock cycle, creating a
904 sync'd latch out of o_data and o_valid as an indirect byproduct
905 of using PassThroughStage
906 """
907
908 def __init__(self, iospecfn):
909 UnbufferedPipeline.__init__(self, PassThroughStage(iospecfn))
910
911
912 class FIFOControl(ControlBase):
913 """ FIFO Control. Uses Queue to store data, coincidentally
914 happens to have same valid/ready signalling as Stage API.
915
916 i_data -> fifo.din -> FIFO -> fifo.dout -> o_data
917 """
918
919 def __init__(self, depth, stage, in_multi=None, stage_ctl=False,
920 fwft=True, pipe=False):
921 """ FIFO Control
922
923 * :depth: number of entries in the FIFO
924 * :stage: data processing block
925 * :fwft: first word fall-thru mode (non-fwft introduces delay)
926 * :pipe: specifies pipe mode.
927
928 when fwft = True it indicates that transfers may occur
929 combinatorially through stage processing in the same clock cycle.
930 This requires that the Stage be a Moore FSM:
931 https://en.wikipedia.org/wiki/Moore_machine
932
933 when fwft = False it indicates that all output signals are
934 produced only from internal registers or memory, i.e. that the
935 Stage is a Mealy FSM:
936 https://en.wikipedia.org/wiki/Mealy_machine
937
938 data is processed (and located) as follows:
939
940 self.p self.stage temp fn temp fn temp fp self.n
941 i_data->process()->result->cat->din.FIFO.dout->cat(o_data)
942
943 yes, really: cat produces a Cat() which can be assigned to.
944 this is how the FIFO gets de-catted without needing a de-cat
945 function
946 """
947 self.fwft = fwft
948 self.pipe = pipe
949 self.fdepth = depth
950 ControlBase.__init__(self, stage, in_multi, stage_ctl)
951
952 def elaborate(self, platform):
953 self.m = m = ControlBase.elaborate(self, platform)
954
955 # make a FIFO with a signal of equal width to the o_data.
956 (fwidth, _) = nmoperator.shape(self.n.o_data)
957 fifo = Queue(fwidth, self.fdepth, fwft=self.fwft, pipe=self.pipe)
958 m.submodules.fifo = fifo
959
960 def processfn(i_data):
961 # store result of processing in combinatorial temporary
962 result = _spec(self.stage.ospec, "r_temp")
963 m.d.comb += nmoperator.eq(result, self.process(i_data))
964 return nmoperator.cat(result)
965
966 # prev: make the FIFO (Queue object) "look" like a PrevControl...
967 m.submodules.fp = fp = PrevControl()
968 fp.i_valid, fp._o_ready, fp.i_data = fifo.w_en, fifo.w_rdy, fifo.w_data
969 m.d.comb += fp._connect_in(self.p, fn=processfn)
970
971 # next: make the FIFO (Queue object) "look" like a NextControl...
972 m.submodules.fn = fn = NextControl()
973 fn.o_valid, fn.i_ready, fn.o_data = fifo.r_rdy, fifo.r_en, fifo.r_data
974 connections = fn._connect_out(self.n, fn=nmoperator.cat)
975 valid_eq, ready_eq, o_data = connections
976
977 # ok ok so we can't just do the ready/valid eqs straight:
978 # first 2 from connections are the ready/valid, 3rd is data.
979 if self.fwft:
980 # combinatorial on next ready/valid
981 m.d.comb += [valid_eq, ready_eq]
982 else:
983 m.d.sync += [valid_eq, ready_eq] # non-fwft mode needs sync
984 o_data = self._postprocess(o_data) # XXX TBD, does nothing right now
985 m.d.comb += o_data
986
987 return m
988
989
990 # aka "RegStage".
991 class UnbufferedPipeline(FIFOControl):
992 def __init__(self, stage, in_multi=None, stage_ctl=False):
993 FIFOControl.__init__(self, 1, stage, in_multi, stage_ctl,
994 fwft=True, pipe=False)
995
996 # aka "BreakReadyStage" XXX had to set fwft=True to get it to work
997
998
999 class PassThroughHandshake(FIFOControl):
1000 def __init__(self, stage, in_multi=None, stage_ctl=False):
1001 FIFOControl.__init__(self, 1, stage, in_multi, stage_ctl,
1002 fwft=True, pipe=True)
1003
1004 # this is *probably* BufferedHandshake, although test #997 now succeeds.
1005
1006
1007 class BufferedHandshake(FIFOControl):
1008 def __init__(self, stage, in_multi=None, stage_ctl=False):
1009 FIFOControl.__init__(self, 2, stage, in_multi, stage_ctl,
1010 fwft=True, pipe=False)
1011
1012
1013 """
1014 # this is *probably* SimpleHandshake (note: memory cell size=0)
1015 class SimpleHandshake(FIFOControl):
1016 def __init__(self, stage, in_multi=None, stage_ctl=False):
1017 FIFOControl.__init__(self, 0, stage, in_multi, stage_ctl,
1018 fwft=True, pipe=False)
1019 """