respect Ready/Valid signalling (stall capability) in MaskCancellable
[ieee754fpu.git] / src / nmutil / singlepipe.py
1 """ Pipeline API. For multi-input and multi-output variants, see multipipe.
2
3 Associated development bugs:
4 * http://bugs.libre-riscv.org/show_bug.cgi?id=64
5 * http://bugs.libre-riscv.org/show_bug.cgi?id=57
6
7 Important: see Stage API (stageapi.py) in combination with below
8
9 RecordBasedStage:
10 ----------------
11
12 A convenience class that takes an input shape, output shape, a
13 "processing" function and an optional "setup" function. Honestly
14 though, there's not much more effort to just... create a class
15 that returns a couple of Records (see ExampleAddRecordStage in
16 examples).
17
18 PassThroughStage:
19 ----------------
20
21 A convenience class that takes a single function as a parameter,
22 that is chain-called to create the exact same input and output spec.
23 It has a process() function that simply returns its input.
24
25 Instances of this class are completely redundant if handed to
26 StageChain, however when passed to UnbufferedPipeline they
27 can be used to introduce a single clock delay.
28
29 ControlBase:
30 -----------
31
32 The base class for pipelines. Contains previous and next ready/valid/data.
33 Also has an extremely useful "connect" function that can be used to
34 connect a chain of pipelines and present the exact same prev/next
35 ready/valid/data API.
36
37 Note: pipelines basically do not become pipelines as such until
38 handed to a derivative of ControlBase. ControlBase itself is *not*
39 strictly considered a pipeline class. Wishbone and AXI4 (master or
40 slave) could be derived from ControlBase, for example.
41 UnbufferedPipeline:
42 ------------------
43
44 A simple stalling clock-synchronised pipeline that has no buffering
45 (unlike BufferedHandshake). Data flows on *every* clock cycle when
46 the conditions are right (this is nominally when the input is valid
47 and the output is ready).
48
49 A stall anywhere along the line will result in a stall back-propagating
50 down the entire chain. The BufferedHandshake by contrast will buffer
51 incoming data, allowing previous stages one clock cycle's grace before
52 also having to stall.
53
54 An advantage of the UnbufferedPipeline over the Buffered one is
55 that the amount of logic needed (number of gates) is greatly
56 reduced (no second set of buffers basically)
57
58 The disadvantage of the UnbufferedPipeline is that the valid/ready
59 logic, if chained together, is *combinatorial*, resulting in
60 progressively larger gate delay.
61
62 PassThroughHandshake:
63 ------------------
64
65 A Control class that introduces a single clock delay, passing its
66 data through unaltered. Unlike RegisterPipeline (which relies
67 on UnbufferedPipeline and PassThroughStage) it handles ready/valid
68 itself.
69
70 RegisterPipeline:
71 ----------------
72
73 A convenience class that, because UnbufferedPipeline introduces a single
74 clock delay, when its stage is a PassThroughStage, it results in a Pipeline
75 stage that, duh, delays its (unmodified) input by one clock cycle.
76
77 BufferedHandshake:
78 ----------------
79
80 nmigen implementation of buffered pipeline stage, based on zipcpu:
81 https://zipcpu.com/blog/2017/08/14/strategies-for-pipelining.html
82
83 this module requires quite a bit of thought to understand how it works
84 (and why it is needed in the first place). reading the above is
85 *strongly* recommended.
86
87 unlike john dawson's IEEE754 FPU STB/ACK signalling, which requires
88 the STB / ACK signals to raise and lower (on separate clocks) before
89 data may proceeed (thus only allowing one piece of data to proceed
90 on *ALTERNATE* cycles), the signalling here is a true pipeline
91 where data will flow on *every* clock when the conditions are right.
92
93 input acceptance conditions are when:
94 * incoming previous-stage strobe (p.valid_i) is HIGH
95 * outgoing previous-stage ready (p.ready_o) is LOW
96
97 output transmission conditions are when:
98 * outgoing next-stage strobe (n.valid_o) is HIGH
99 * outgoing next-stage ready (n.ready_i) is LOW
100
101 the tricky bit is when the input has valid data and the output is not
102 ready to accept it. if it wasn't for the clock synchronisation, it
103 would be possible to tell the input "hey don't send that data, we're
104 not ready". unfortunately, it's not possible to "change the past":
105 the previous stage *has no choice* but to pass on its data.
106
107 therefore, the incoming data *must* be accepted - and stored: that
108 is the responsibility / contract that this stage *must* accept.
109 on the same clock, it's possible to tell the input that it must
110 not send any more data. this is the "stall" condition.
111
112 we now effectively have *two* possible pieces of data to "choose" from:
113 the buffered data, and the incoming data. the decision as to which
114 to process and output is based on whether we are in "stall" or not.
115 i.e. when the next stage is no longer ready, the output comes from
116 the buffer if a stall had previously occurred, otherwise it comes
117 direct from processing the input.
118
119 this allows us to respect a synchronous "travelling STB" with what
120 dan calls a "buffered handshake".
121
122 it's quite a complex state machine!
123
124 SimpleHandshake
125 ---------------
126
127 Synchronised pipeline, Based on:
128 https://github.com/ZipCPU/dbgbus/blob/master/hexbus/rtl/hbdeword.v
129 """
130
131 from nmigen import Signal, Mux, Module, Elaboratable, Const
132 from nmigen.cli import verilog, rtlil
133 from nmigen.hdl.rec import Record
134
135 from nmutil.queue import Queue
136 import inspect
137
138 from nmutil.iocontrol import (PrevControl, NextControl, Object, RecordObject)
139 from nmutil.stageapi import (_spec, StageCls, Stage, StageChain, StageHelper)
140 from nmutil import nmoperator
141
142
143 class RecordBasedStage(Stage):
144 """ convenience class which provides a Records-based layout.
145 honestly it's a lot easier just to create a direct Records-based
146 class (see ExampleAddRecordStage)
147 """
148 def __init__(self, in_shape, out_shape, processfn, setupfn=None):
149 self.in_shape = in_shape
150 self.out_shape = out_shape
151 self.__process = processfn
152 self.__setup = setupfn
153 def ispec(self): return Record(self.in_shape)
154 def ospec(self): return Record(self.out_shape)
155 def process(seif, i): return self.__process(i)
156 def setup(seif, m, i): return self.__setup(m, i)
157
158
159 class PassThroughStage(StageCls):
160 """ a pass-through stage with its input data spec identical to its output,
161 and "passes through" its data from input to output (does nothing).
162
163 use this basically to explicitly make any data spec Stage-compliant.
164 (many APIs would potentially use a static "wrap" method in e.g.
165 StageCls to achieve a similar effect)
166 """
167 def __init__(self, iospecfn): self.iospecfn = iospecfn
168 def ispec(self): return self.iospecfn()
169 def ospec(self): return self.iospecfn()
170
171
172 class ControlBase(StageHelper, Elaboratable):
173 """ Common functions for Pipeline API. Note: a "pipeline stage" only
174 exists (conceptually) when a ControlBase derivative is handed
175 a Stage (combinatorial block)
176
177 NOTE: ControlBase derives from StageHelper, making it accidentally
178 compliant with the Stage API. Using those functions directly
179 *BYPASSES* a ControlBase instance ready/valid signalling, which
180 clearly should not be done without a really, really good reason.
181 """
182 def __init__(self, stage=None, in_multi=None, stage_ctl=False, maskwid=0):
183 """ Base class containing ready/valid/data to previous and next stages
184
185 * p: contains ready/valid to the previous stage
186 * n: contains ready/valid to the next stage
187
188 Except when calling Controlbase.connect(), user must also:
189 * add data_i member to PrevControl (p) and
190 * add data_o member to NextControl (n)
191 Calling ControlBase._new_data is a good way to do that.
192 """
193 print ("ControlBase", self, stage, in_multi, stage_ctl)
194 StageHelper.__init__(self, stage)
195
196 # set up input and output IO ACK (prev/next ready/valid)
197 self.p = PrevControl(in_multi, stage_ctl, maskwid=maskwid)
198 self.n = NextControl(stage_ctl, maskwid=maskwid)
199
200 # set up the input and output data
201 if stage is not None:
202 self._new_data("data")
203
204 def _new_data(self, name):
205 """ allocates new data_i and data_o
206 """
207 self.p.data_i, self.n.data_o = self.new_specs(name)
208
209 @property
210 def data_r(self):
211 return self.process(self.p.data_i)
212
213 def connect_to_next(self, nxt):
214 """ helper function to connect to the next stage data/valid/ready.
215 """
216 return self.n.connect_to_next(nxt.p)
217
218 def _connect_in(self, prev):
219 """ internal helper function to connect stage to an input source.
220 do not use to connect stage-to-stage!
221 """
222 return self.p._connect_in(prev.p)
223
224 def _connect_out(self, nxt):
225 """ internal helper function to connect stage to an output source.
226 do not use to connect stage-to-stage!
227 """
228 return self.n._connect_out(nxt.n)
229
230 def connect(self, pipechain):
231 """ connects a chain (list) of Pipeline instances together and
232 links them to this ControlBase instance:
233
234 in <----> self <---> out
235 | ^
236 v |
237 [pipe1, pipe2, pipe3, pipe4]
238 | ^ | ^ | ^
239 v | v | v |
240 out---in out--in out---in
241
242 Also takes care of allocating data_i/data_o, by looking up
243 the data spec for each end of the pipechain. i.e It is NOT
244 necessary to allocate self.p.data_i or self.n.data_o manually:
245 this is handled AUTOMATICALLY, here.
246
247 Basically this function is the direct equivalent of StageChain,
248 except that unlike StageChain, the Pipeline logic is followed.
249
250 Just as StageChain presents an object that conforms to the
251 Stage API from a list of objects that also conform to the
252 Stage API, an object that calls this Pipeline connect function
253 has the exact same pipeline API as the list of pipline objects
254 it is called with.
255
256 Thus it becomes possible to build up larger chains recursively.
257 More complex chains (multi-input, multi-output) will have to be
258 done manually.
259
260 Argument:
261
262 * :pipechain: - a sequence of ControlBase-derived classes
263 (must be one or more in length)
264
265 Returns:
266
267 * a list of eq assignments that will need to be added in
268 an elaborate() to m.d.comb
269 """
270 assert len(pipechain) > 0, "pipechain must be non-zero length"
271 assert self.stage is None, "do not use connect with a stage"
272 eqs = [] # collated list of assignment statements
273
274 # connect inter-chain
275 for i in range(len(pipechain)-1):
276 pipe1 = pipechain[i] # earlier
277 pipe2 = pipechain[i+1] # later (by 1)
278 eqs += pipe1.connect_to_next(pipe2) # earlier n to later p
279
280 # connect front and back of chain to ourselves
281 front = pipechain[0] # first in chain
282 end = pipechain[-1] # last in chain
283 self.set_specs(front, end) # sets up ispec/ospec functions
284 self._new_data("chain") # NOTE: REPLACES existing data
285 eqs += front._connect_in(self) # front p to our p
286 eqs += end._connect_out(self) # end n to our n
287
288 return eqs
289
290 def set_input(self, i):
291 """ helper function to set the input data (used in unit tests)
292 """
293 return nmoperator.eq(self.p.data_i, i)
294
295 def __iter__(self):
296 yield from self.p # yields ready/valid/data (data also gets yielded)
297 yield from self.n # ditto
298
299 def ports(self):
300 return list(self)
301
302 def elaborate(self, platform):
303 """ handles case where stage has dynamic ready/valid functions
304 """
305 m = Module()
306 m.submodules.p = self.p
307 m.submodules.n = self.n
308
309 self.setup(m, self.p.data_i)
310
311 if not self.p.stage_ctl:
312 return m
313
314 # intercept the previous (outgoing) "ready", combine with stage ready
315 m.d.comb += self.p.s_ready_o.eq(self.p._ready_o & self.stage.d_ready)
316
317 # intercept the next (incoming) "ready" and combine it with data valid
318 sdv = self.stage.d_valid(self.n.ready_i)
319 m.d.comb += self.n.d_valid.eq(self.n.ready_i & sdv)
320
321 return m
322
323
324 class BufferedHandshake(ControlBase):
325 """ buffered pipeline stage. data and strobe signals travel in sync.
326 if ever the input is ready and the output is not, processed data
327 is shunted in a temporary register.
328
329 Argument: stage. see Stage API above
330
331 stage-1 p.valid_i >>in stage n.valid_o out>> stage+1
332 stage-1 p.ready_o <<out stage n.ready_i <<in stage+1
333 stage-1 p.data_i >>in stage n.data_o out>> stage+1
334 | |
335 process --->----^
336 | |
337 +-- r_data ->-+
338
339 input data p.data_i is read (only), is processed and goes into an
340 intermediate result store [process()]. this is updated combinatorially.
341
342 in a non-stall condition, the intermediate result will go into the
343 output (update_output). however if ever there is a stall, it goes
344 into r_data instead [update_buffer()].
345
346 when the non-stall condition is released, r_data is the first
347 to be transferred to the output [flush_buffer()], and the stall
348 condition cleared.
349
350 on the next cycle (as long as stall is not raised again) the
351 input may begin to be processed and transferred directly to output.
352 """
353
354 def elaborate(self, platform):
355 self.m = ControlBase.elaborate(self, platform)
356
357 result = _spec(self.stage.ospec, "r_tmp")
358 r_data = _spec(self.stage.ospec, "r_data")
359
360 # establish some combinatorial temporaries
361 o_n_validn = Signal(reset_less=True)
362 n_ready_i = Signal(reset_less=True, name="n_i_rdy_data")
363 nir_por = Signal(reset_less=True)
364 nir_por_n = Signal(reset_less=True)
365 p_valid_i = Signal(reset_less=True)
366 nir_novn = Signal(reset_less=True)
367 nirn_novn = Signal(reset_less=True)
368 por_pivn = Signal(reset_less=True)
369 npnn = Signal(reset_less=True)
370 self.m.d.comb += [p_valid_i.eq(self.p.valid_i_test),
371 o_n_validn.eq(~self.n.valid_o),
372 n_ready_i.eq(self.n.ready_i_test),
373 nir_por.eq(n_ready_i & self.p._ready_o),
374 nir_por_n.eq(n_ready_i & ~self.p._ready_o),
375 nir_novn.eq(n_ready_i | o_n_validn),
376 nirn_novn.eq(~n_ready_i & o_n_validn),
377 npnn.eq(nir_por | nirn_novn),
378 por_pivn.eq(self.p._ready_o & ~p_valid_i)
379 ]
380
381 # store result of processing in combinatorial temporary
382 self.m.d.comb += nmoperator.eq(result, self.data_r)
383
384 # if not in stall condition, update the temporary register
385 with self.m.If(self.p.ready_o): # not stalled
386 self.m.d.sync += nmoperator.eq(r_data, result) # update buffer
387
388 # data pass-through conditions
389 with self.m.If(npnn):
390 data_o = self._postprocess(result) # XXX TBD, does nothing right now
391 self.m.d.sync += [self.n.valid_o.eq(p_valid_i), # valid if p_valid
392 nmoperator.eq(self.n.data_o, data_o), # update out
393 ]
394 # buffer flush conditions (NOTE: can override data passthru conditions)
395 with self.m.If(nir_por_n): # not stalled
396 # Flush the [already processed] buffer to the output port.
397 data_o = self._postprocess(r_data) # XXX TBD, does nothing right now
398 self.m.d.sync += [self.n.valid_o.eq(1), # reg empty
399 nmoperator.eq(self.n.data_o, data_o), # flush
400 ]
401 # output ready conditions
402 self.m.d.sync += self.p._ready_o.eq(nir_novn | por_pivn)
403
404 return self.m
405
406
407 class MaskNoDelayCancellable(ControlBase):
408 """ Mask-activated Cancellable pipeline (that does not respect "ready")
409
410 Based on (identical behaviour to) SimpleHandshake.
411 TODO: decide whether to merge *into* SimpleHandshake.
412
413 Argument: stage. see Stage API above
414
415 stage-1 p.valid_i >>in stage n.valid_o out>> stage+1
416 stage-1 p.ready_o <<out stage n.ready_i <<in stage+1
417 stage-1 p.data_i >>in stage n.data_o out>> stage+1
418 | |
419 +--process->--^
420 """
421 def __init__(self, stage, maskwid, in_multi=None, stage_ctl=False):
422 ControlBase.__init__(self, stage, in_multi, stage_ctl, maskwid)
423
424 def elaborate(self, platform):
425 self.m = m = ControlBase.elaborate(self, platform)
426
427 # store result of processing in combinatorial temporary
428 result = _spec(self.stage.ospec, "r_tmp")
429 m.d.comb += nmoperator.eq(result, self.data_r)
430
431 # establish if the data should be passed on. cancellation is
432 # a global signal.
433 # XXX EXCEPTIONAL CIRCUMSTANCES: inspection of the data payload
434 # is NOT "normal" for the Stage API.
435 p_valid_i = Signal(reset_less=True)
436 #print ("self.p.data_i", self.p.data_i)
437 maskedout = Signal(len(self.p.mask_i), reset_less=True)
438 m.d.comb += maskedout.eq(self.p.mask_i & ~self.p.stop_i)
439 m.d.comb += p_valid_i.eq(maskedout.bool())
440
441 # if idmask nonzero, mask gets passed on (and register set).
442 # register is left as-is if idmask is zero, but out-mask is set to zero
443 # note however: only the *uncancelled* mask bits get passed on
444 m.d.sync += self.n.valid_o.eq(p_valid_i)
445 m.d.sync += self.n.mask_o.eq(Mux(p_valid_i, maskedout, 0))
446 with m.If(p_valid_i):
447 data_o = self._postprocess(result) # XXX TBD, does nothing right now
448 m.d.sync += nmoperator.eq(self.n.data_o, data_o) # update output
449
450 # output valid if
451 # input always "ready"
452 #m.d.comb += self.p._ready_o.eq(self.n.ready_i_test)
453 m.d.comb += self.p._ready_o.eq(Const(1))
454
455 # always pass on stop (as combinatorial: single signal)
456 m.d.comb += self.n.stop_o.eq(self.p.stop_i)
457
458 return self.m
459
460 class MaskCancellable(ControlBase):
461 """ Mask-activated Cancellable pipeline
462
463 Argument: stage. see Stage API above
464
465 stage-1 p.valid_i >>in stage n.valid_o out>> stage+1
466 stage-1 p.ready_o <<out stage n.ready_i <<in stage+1
467 stage-1 p.data_i >>in stage n.data_o out>> stage+1
468 | |
469 +--process->--^
470 """
471 def __init__(self, stage, maskwid, in_multi=None, stage_ctl=False):
472 ControlBase.__init__(self, stage, in_multi, stage_ctl, maskwid)
473
474 def elaborate(self, platform):
475 self.m = m = ControlBase.elaborate(self, platform)
476
477 r_busy = Signal()
478 result = _spec(self.stage.ospec, "r_tmp")
479
480 # establish if the data should be passed on. cancellation is
481 # a global signal.
482 p_valid_i = Signal(reset_less=True)
483 #print ("self.p.data_i", self.p.data_i)
484 maskedout = Signal(len(self.p.mask_i), reset_less=True)
485 m.d.comb += maskedout.eq(self.p.mask_i & ~self.p.stop_i)
486
487 # establish some combinatorial temporaries
488 n_ready_i = Signal(reset_less=True, name="n_i_rdy_data")
489 p_valid_i_p_ready_o = Signal(reset_less=True)
490 m.d.comb += [p_valid_i.eq(self.p.valid_i_test & maskedout.bool()),
491 n_ready_i.eq(self.n.ready_i_test),
492 p_valid_i_p_ready_o.eq(p_valid_i & self.p.ready_o),
493 ]
494
495 # store result of processing in combinatorial temporary
496 m.d.comb += nmoperator.eq(result, self.data_r)
497
498 # if idmask nonzero, mask gets passed on (and register set).
499 # register is left as-is if idmask is zero, but out-mask is set to zero
500 # note however: only the *uncancelled* mask bits get passed on
501 m.d.sync += self.n.mask_o.eq(Mux(p_valid_i, maskedout, 0))
502
503 # always pass on stop (as combinatorial: single signal)
504 m.d.comb += self.n.stop_o.eq(self.p.stop_i)
505
506 # previous valid and ready
507 with m.If(p_valid_i_p_ready_o):
508 data_o = self._postprocess(result) # XXX TBD, does nothing right now
509 m.d.sync += [r_busy.eq(1), # output valid
510 nmoperator.eq(self.n.data_o, data_o), # update output
511 ]
512 # previous invalid or not ready, however next is accepting
513 with m.Elif(n_ready_i):
514 data_o = self._postprocess(result) # XXX TBD, does nothing right now
515 m.d.sync += [nmoperator.eq(self.n.data_o, data_o)]
516 # TODO: could still send data here (if there was any)
517 #m.d.sync += self.n.valid_o.eq(0) # ...so set output invalid
518 m.d.sync += r_busy.eq(0) # ...so set output invalid
519
520 m.d.comb += self.n.valid_o.eq(r_busy)
521 # if next is ready, so is previous
522 m.d.comb += self.p._ready_o.eq(n_ready_i)
523
524 return self.m
525
526
527 class SimpleHandshake(ControlBase):
528 """ simple handshake control. data and strobe signals travel in sync.
529 implements the protocol used by Wishbone and AXI4.
530
531 Argument: stage. see Stage API above
532
533 stage-1 p.valid_i >>in stage n.valid_o out>> stage+1
534 stage-1 p.ready_o <<out stage n.ready_i <<in stage+1
535 stage-1 p.data_i >>in stage n.data_o out>> stage+1
536 | |
537 +--process->--^
538 Truth Table
539
540 Inputs Temporary Output Data
541 ------- ---------- ----- ----
542 P P N N PiV& ~NiR& N P
543 i o i o PoR NoV o o
544 V R R V V R
545
546 ------- - - - -
547 0 0 0 0 0 0 >0 0 reg
548 0 0 0 1 0 1 >1 0 reg
549 0 0 1 0 0 0 0 1 process(data_i)
550 0 0 1 1 0 0 0 1 process(data_i)
551 ------- - - - -
552 0 1 0 0 0 0 >0 0 reg
553 0 1 0 1 0 1 >1 0 reg
554 0 1 1 0 0 0 0 1 process(data_i)
555 0 1 1 1 0 0 0 1 process(data_i)
556 ------- - - - -
557 1 0 0 0 0 0 >0 0 reg
558 1 0 0 1 0 1 >1 0 reg
559 1 0 1 0 0 0 0 1 process(data_i)
560 1 0 1 1 0 0 0 1 process(data_i)
561 ------- - - - -
562 1 1 0 0 1 0 1 0 process(data_i)
563 1 1 0 1 1 1 1 0 process(data_i)
564 1 1 1 0 1 0 1 1 process(data_i)
565 1 1 1 1 1 0 1 1 process(data_i)
566 ------- - - - -
567 """
568
569 def elaborate(self, platform):
570 self.m = m = ControlBase.elaborate(self, platform)
571
572 r_busy = Signal()
573 result = _spec(self.stage.ospec, "r_tmp")
574
575 # establish some combinatorial temporaries
576 n_ready_i = Signal(reset_less=True, name="n_i_rdy_data")
577 p_valid_i_p_ready_o = Signal(reset_less=True)
578 p_valid_i = Signal(reset_less=True)
579 m.d.comb += [p_valid_i.eq(self.p.valid_i_test),
580 n_ready_i.eq(self.n.ready_i_test),
581 p_valid_i_p_ready_o.eq(p_valid_i & self.p.ready_o),
582 ]
583
584 # store result of processing in combinatorial temporary
585 m.d.comb += nmoperator.eq(result, self.data_r)
586
587 # previous valid and ready
588 with m.If(p_valid_i_p_ready_o):
589 data_o = self._postprocess(result) # XXX TBD, does nothing right now
590 m.d.sync += [r_busy.eq(1), # output valid
591 nmoperator.eq(self.n.data_o, data_o), # update output
592 ]
593 # previous invalid or not ready, however next is accepting
594 with m.Elif(n_ready_i):
595 data_o = self._postprocess(result) # XXX TBD, does nothing right now
596 m.d.sync += [nmoperator.eq(self.n.data_o, data_o)]
597 # TODO: could still send data here (if there was any)
598 #m.d.sync += self.n.valid_o.eq(0) # ...so set output invalid
599 m.d.sync += r_busy.eq(0) # ...so set output invalid
600
601 m.d.comb += self.n.valid_o.eq(r_busy)
602 # if next is ready, so is previous
603 m.d.comb += self.p._ready_o.eq(n_ready_i)
604
605 return self.m
606
607
608 class UnbufferedPipeline(ControlBase):
609 """ A simple pipeline stage with single-clock synchronisation
610 and two-way valid/ready synchronised signalling.
611
612 Note that a stall in one stage will result in the entire pipeline
613 chain stalling.
614
615 Also that unlike BufferedHandshake, the valid/ready signalling does NOT
616 travel synchronously with the data: the valid/ready signalling
617 combines in a *combinatorial* fashion. Therefore, a long pipeline
618 chain will lengthen propagation delays.
619
620 Argument: stage. see Stage API, above
621
622 stage-1 p.valid_i >>in stage n.valid_o out>> stage+1
623 stage-1 p.ready_o <<out stage n.ready_i <<in stage+1
624 stage-1 p.data_i >>in stage n.data_o out>> stage+1
625 | |
626 r_data result
627 | |
628 +--process ->-+
629
630 Attributes:
631 -----------
632 p.data_i : StageInput, shaped according to ispec
633 The pipeline input
634 p.data_o : StageOutput, shaped according to ospec
635 The pipeline output
636 r_data : input_shape according to ispec
637 A temporary (buffered) copy of a prior (valid) input.
638 This is HELD if the output is not ready. It is updated
639 SYNCHRONOUSLY.
640 result: output_shape according to ospec
641 The output of the combinatorial logic. it is updated
642 COMBINATORIALLY (no clock dependence).
643
644 Truth Table
645
646 Inputs Temp Output Data
647 ------- - ----- ----
648 P P N N ~NiR& N P
649 i o i o NoV o o
650 V R R V V R
651
652 ------- - - -
653 0 0 0 0 0 0 1 reg
654 0 0 0 1 1 1 0 reg
655 0 0 1 0 0 0 1 reg
656 0 0 1 1 0 0 1 reg
657 ------- - - -
658 0 1 0 0 0 0 1 reg
659 0 1 0 1 1 1 0 reg
660 0 1 1 0 0 0 1 reg
661 0 1 1 1 0 0 1 reg
662 ------- - - -
663 1 0 0 0 0 1 1 reg
664 1 0 0 1 1 1 0 reg
665 1 0 1 0 0 1 1 reg
666 1 0 1 1 0 1 1 reg
667 ------- - - -
668 1 1 0 0 0 1 1 process(data_i)
669 1 1 0 1 1 1 0 process(data_i)
670 1 1 1 0 0 1 1 process(data_i)
671 1 1 1 1 0 1 1 process(data_i)
672 ------- - - -
673
674 Note: PoR is *NOT* involved in the above decision-making.
675 """
676
677 def elaborate(self, platform):
678 self.m = m = ControlBase.elaborate(self, platform)
679
680 data_valid = Signal() # is data valid or not
681 r_data = _spec(self.stage.ospec, "r_tmp") # output type
682
683 # some temporaries
684 p_valid_i = Signal(reset_less=True)
685 pv = Signal(reset_less=True)
686 buf_full = Signal(reset_less=True)
687 m.d.comb += p_valid_i.eq(self.p.valid_i_test)
688 m.d.comb += pv.eq(self.p.valid_i & self.p.ready_o)
689 m.d.comb += buf_full.eq(~self.n.ready_i_test & data_valid)
690
691 m.d.comb += self.n.valid_o.eq(data_valid)
692 m.d.comb += self.p._ready_o.eq(~data_valid | self.n.ready_i_test)
693 m.d.sync += data_valid.eq(p_valid_i | buf_full)
694
695 with m.If(pv):
696 m.d.sync += nmoperator.eq(r_data, self.data_r)
697 data_o = self._postprocess(r_data) # XXX TBD, does nothing right now
698 m.d.comb += nmoperator.eq(self.n.data_o, data_o)
699
700 return self.m
701
702
703 class UnbufferedPipeline2(ControlBase):
704 """ A simple pipeline stage with single-clock synchronisation
705 and two-way valid/ready synchronised signalling.
706
707 Note that a stall in one stage will result in the entire pipeline
708 chain stalling.
709
710 Also that unlike BufferedHandshake, the valid/ready signalling does NOT
711 travel synchronously with the data: the valid/ready signalling
712 combines in a *combinatorial* fashion. Therefore, a long pipeline
713 chain will lengthen propagation delays.
714
715 Argument: stage. see Stage API, above
716
717 stage-1 p.valid_i >>in stage n.valid_o out>> stage+1
718 stage-1 p.ready_o <<out stage n.ready_i <<in stage+1
719 stage-1 p.data_i >>in stage n.data_o out>> stage+1
720 | | |
721 +- process-> buf <-+
722 Attributes:
723 -----------
724 p.data_i : StageInput, shaped according to ispec
725 The pipeline input
726 p.data_o : StageOutput, shaped according to ospec
727 The pipeline output
728 buf : output_shape according to ospec
729 A temporary (buffered) copy of a valid output
730 This is HELD if the output is not ready. It is updated
731 SYNCHRONOUSLY.
732
733 Inputs Temp Output Data
734 ------- - -----
735 P P N N ~NiR& N P (buf_full)
736 i o i o NoV o o
737 V R R V V R
738
739 ------- - - -
740 0 0 0 0 0 0 1 process(data_i)
741 0 0 0 1 1 1 0 reg (odata, unchanged)
742 0 0 1 0 0 0 1 process(data_i)
743 0 0 1 1 0 0 1 process(data_i)
744 ------- - - -
745 0 1 0 0 0 0 1 process(data_i)
746 0 1 0 1 1 1 0 reg (odata, unchanged)
747 0 1 1 0 0 0 1 process(data_i)
748 0 1 1 1 0 0 1 process(data_i)
749 ------- - - -
750 1 0 0 0 0 1 1 process(data_i)
751 1 0 0 1 1 1 0 reg (odata, unchanged)
752 1 0 1 0 0 1 1 process(data_i)
753 1 0 1 1 0 1 1 process(data_i)
754 ------- - - -
755 1 1 0 0 0 1 1 process(data_i)
756 1 1 0 1 1 1 0 reg (odata, unchanged)
757 1 1 1 0 0 1 1 process(data_i)
758 1 1 1 1 0 1 1 process(data_i)
759 ------- - - -
760
761 Note: PoR is *NOT* involved in the above decision-making.
762 """
763
764 def elaborate(self, platform):
765 self.m = m = ControlBase.elaborate(self, platform)
766
767 buf_full = Signal() # is data valid or not
768 buf = _spec(self.stage.ospec, "r_tmp") # output type
769
770 # some temporaries
771 p_valid_i = Signal(reset_less=True)
772 m.d.comb += p_valid_i.eq(self.p.valid_i_test)
773
774 m.d.comb += self.n.valid_o.eq(buf_full | p_valid_i)
775 m.d.comb += self.p._ready_o.eq(~buf_full)
776 m.d.sync += buf_full.eq(~self.n.ready_i_test & self.n.valid_o)
777
778 data_o = Mux(buf_full, buf, self.data_r)
779 data_o = self._postprocess(data_o) # XXX TBD, does nothing right now
780 m.d.comb += nmoperator.eq(self.n.data_o, data_o)
781 m.d.sync += nmoperator.eq(buf, self.n.data_o)
782
783 return self.m
784
785
786 class PassThroughHandshake(ControlBase):
787 """ A control block that delays by one clock cycle.
788
789 Inputs Temporary Output Data
790 ------- ------------------ ----- ----
791 P P N N PiV& PiV| NiR| pvr N P (pvr)
792 i o i o PoR ~PoR ~NoV o o
793 V R R V V R
794
795 ------- - - - - - -
796 0 0 0 0 0 1 1 0 1 1 odata (unchanged)
797 0 0 0 1 0 1 0 0 1 0 odata (unchanged)
798 0 0 1 0 0 1 1 0 1 1 odata (unchanged)
799 0 0 1 1 0 1 1 0 1 1 odata (unchanged)
800 ------- - - - - - -
801 0 1 0 0 0 0 1 0 0 1 odata (unchanged)
802 0 1 0 1 0 0 0 0 0 0 odata (unchanged)
803 0 1 1 0 0 0 1 0 0 1 odata (unchanged)
804 0 1 1 1 0 0 1 0 0 1 odata (unchanged)
805 ------- - - - - - -
806 1 0 0 0 0 1 1 1 1 1 process(in)
807 1 0 0 1 0 1 0 0 1 0 odata (unchanged)
808 1 0 1 0 0 1 1 1 1 1 process(in)
809 1 0 1 1 0 1 1 1 1 1 process(in)
810 ------- - - - - - -
811 1 1 0 0 1 1 1 1 1 1 process(in)
812 1 1 0 1 1 1 0 0 1 0 odata (unchanged)
813 1 1 1 0 1 1 1 1 1 1 process(in)
814 1 1 1 1 1 1 1 1 1 1 process(in)
815 ------- - - - - - -
816
817 """
818
819 def elaborate(self, platform):
820 self.m = m = ControlBase.elaborate(self, platform)
821
822 r_data = _spec(self.stage.ospec, "r_tmp") # output type
823
824 # temporaries
825 p_valid_i = Signal(reset_less=True)
826 pvr = Signal(reset_less=True)
827 m.d.comb += p_valid_i.eq(self.p.valid_i_test)
828 m.d.comb += pvr.eq(p_valid_i & self.p.ready_o)
829
830 m.d.comb += self.p.ready_o.eq(~self.n.valid_o | self.n.ready_i_test)
831 m.d.sync += self.n.valid_o.eq(p_valid_i | ~self.p.ready_o)
832
833 odata = Mux(pvr, self.data_r, r_data)
834 m.d.sync += nmoperator.eq(r_data, odata)
835 r_data = self._postprocess(r_data) # XXX TBD, does nothing right now
836 m.d.comb += nmoperator.eq(self.n.data_o, r_data)
837
838 return m
839
840
841 class RegisterPipeline(UnbufferedPipeline):
842 """ A pipeline stage that delays by one clock cycle, creating a
843 sync'd latch out of data_o and valid_o as an indirect byproduct
844 of using PassThroughStage
845 """
846 def __init__(self, iospecfn):
847 UnbufferedPipeline.__init__(self, PassThroughStage(iospecfn))
848
849
850 class FIFOControl(ControlBase):
851 """ FIFO Control. Uses Queue to store data, coincidentally
852 happens to have same valid/ready signalling as Stage API.
853
854 data_i -> fifo.din -> FIFO -> fifo.dout -> data_o
855 """
856 def __init__(self, depth, stage, in_multi=None, stage_ctl=False,
857 fwft=True, pipe=False):
858 """ FIFO Control
859
860 * :depth: number of entries in the FIFO
861 * :stage: data processing block
862 * :fwft: first word fall-thru mode (non-fwft introduces delay)
863 * :pipe: specifies pipe mode.
864
865 when fwft = True it indicates that transfers may occur
866 combinatorially through stage processing in the same clock cycle.
867 This requires that the Stage be a Moore FSM:
868 https://en.wikipedia.org/wiki/Moore_machine
869
870 when fwft = False it indicates that all output signals are
871 produced only from internal registers or memory, i.e. that the
872 Stage is a Mealy FSM:
873 https://en.wikipedia.org/wiki/Mealy_machine
874
875 data is processed (and located) as follows:
876
877 self.p self.stage temp fn temp fn temp fp self.n
878 data_i->process()->result->cat->din.FIFO.dout->cat(data_o)
879
880 yes, really: cat produces a Cat() which can be assigned to.
881 this is how the FIFO gets de-catted without needing a de-cat
882 function
883 """
884 self.fwft = fwft
885 self.pipe = pipe
886 self.fdepth = depth
887 ControlBase.__init__(self, stage, in_multi, stage_ctl)
888
889 def elaborate(self, platform):
890 self.m = m = ControlBase.elaborate(self, platform)
891
892 # make a FIFO with a signal of equal width to the data_o.
893 (fwidth, _) = nmoperator.shape(self.n.data_o)
894 fifo = Queue(fwidth, self.fdepth, fwft=self.fwft, pipe=self.pipe)
895 m.submodules.fifo = fifo
896
897 def processfn(data_i):
898 # store result of processing in combinatorial temporary
899 result = _spec(self.stage.ospec, "r_temp")
900 m.d.comb += nmoperator.eq(result, self.process(data_i))
901 return nmoperator.cat(result)
902
903 ## prev: make the FIFO (Queue object) "look" like a PrevControl...
904 m.submodules.fp = fp = PrevControl()
905 fp.valid_i, fp._ready_o, fp.data_i = fifo.we, fifo.writable, fifo.din
906 m.d.comb += fp._connect_in(self.p, fn=processfn)
907
908 # next: make the FIFO (Queue object) "look" like a NextControl...
909 m.submodules.fn = fn = NextControl()
910 fn.valid_o, fn.ready_i, fn.data_o = fifo.readable, fifo.re, fifo.dout
911 connections = fn._connect_out(self.n, fn=nmoperator.cat)
912
913 # ok ok so we can't just do the ready/valid eqs straight:
914 # first 2 from connections are the ready/valid, 3rd is data.
915 if self.fwft:
916 m.d.comb += connections[:2] # combinatorial on next ready/valid
917 else:
918 m.d.sync += connections[:2] # non-fwft mode needs sync
919 data_o = connections[2] # get the data
920 data_o = self._postprocess(data_o) # XXX TBD, does nothing right now
921 m.d.comb += data_o
922
923 return m
924
925
926 # aka "RegStage".
927 class UnbufferedPipeline(FIFOControl):
928 def __init__(self, stage, in_multi=None, stage_ctl=False):
929 FIFOControl.__init__(self, 1, stage, in_multi, stage_ctl,
930 fwft=True, pipe=False)
931
932 # aka "BreakReadyStage" XXX had to set fwft=True to get it to work
933 class PassThroughHandshake(FIFOControl):
934 def __init__(self, stage, in_multi=None, stage_ctl=False):
935 FIFOControl.__init__(self, 1, stage, in_multi, stage_ctl,
936 fwft=True, pipe=True)
937
938 # this is *probably* BufferedHandshake, although test #997 now succeeds.
939 class BufferedHandshake(FIFOControl):
940 def __init__(self, stage, in_multi=None, stage_ctl=False):
941 FIFOControl.__init__(self, 2, stage, in_multi, stage_ctl,
942 fwft=True, pipe=False)
943
944
945 """
946 # this is *probably* SimpleHandshake (note: memory cell size=0)
947 class SimpleHandshake(FIFOControl):
948 def __init__(self, stage, in_multi=None, stage_ctl=False):
949 FIFOControl.__init__(self, 0, stage, in_multi, stage_ctl,
950 fwft=True, pipe=False)
951 """