clarify whats being obtained from _connect_out function
[ieee754fpu.git] / src / nmutil / singlepipe.py
1 """ Pipeline API. For multi-input and multi-output variants, see multipipe.
2
3 Associated development bugs:
4 * http://bugs.libre-riscv.org/show_bug.cgi?id=64
5 * http://bugs.libre-riscv.org/show_bug.cgi?id=57
6
7 Important: see Stage API (stageapi.py) and IO Control API
8 (iocontrol.py) in combination with below. This module
9 "combines" the Stage API with the IO Control API to create
10 the Pipeline API.
11
12 The one critically important key difference between StageAPI and
13 PipelineAPI:
14
15 * StageAPI: combinatorial (NO REGISTERS / LATCHES PERMITTED)
16 * PipelineAPI: synchronous registers / latches get added here
17
18 RecordBasedStage:
19 ----------------
20
21 A convenience class that takes an input shape, output shape, a
22 "processing" function and an optional "setup" function. Honestly
23 though, there's not much more effort to just... create a class
24 that returns a couple of Records (see ExampleAddRecordStage in
25 examples).
26
27 PassThroughStage:
28 ----------------
29
30 A convenience class that takes a single function as a parameter,
31 that is chain-called to create the exact same input and output spec.
32 It has a process() function that simply returns its input.
33
34 Instances of this class are completely redundant if handed to
35 StageChain, however when passed to UnbufferedPipeline they
36 can be used to introduce a single clock delay.
37
38 ControlBase:
39 -----------
40
41 The base class for pipelines. Contains previous and next ready/valid/data.
42 Also has an extremely useful "connect" function that can be used to
43 connect a chain of pipelines and present the exact same prev/next
44 ready/valid/data API.
45
46 Note: pipelines basically do not become pipelines as such until
47 handed to a derivative of ControlBase. ControlBase itself is *not*
48 strictly considered a pipeline class. Wishbone and AXI4 (master or
49 slave) could be derived from ControlBase, for example.
50 UnbufferedPipeline:
51 ------------------
52
53 A simple stalling clock-synchronised pipeline that has no buffering
54 (unlike BufferedHandshake). Data flows on *every* clock cycle when
55 the conditions are right (this is nominally when the input is valid
56 and the output is ready).
57
58 A stall anywhere along the line will result in a stall back-propagating
59 down the entire chain. The BufferedHandshake by contrast will buffer
60 incoming data, allowing previous stages one clock cycle's grace before
61 also having to stall.
62
63 An advantage of the UnbufferedPipeline over the Buffered one is
64 that the amount of logic needed (number of gates) is greatly
65 reduced (no second set of buffers basically)
66
67 The disadvantage of the UnbufferedPipeline is that the valid/ready
68 logic, if chained together, is *combinatorial*, resulting in
69 progressively larger gate delay.
70
71 PassThroughHandshake:
72 ------------------
73
74 A Control class that introduces a single clock delay, passing its
75 data through unaltered. Unlike RegisterPipeline (which relies
76 on UnbufferedPipeline and PassThroughStage) it handles ready/valid
77 itself.
78
79 RegisterPipeline:
80 ----------------
81
82 A convenience class that, because UnbufferedPipeline introduces a single
83 clock delay, when its stage is a PassThroughStage, it results in a Pipeline
84 stage that, duh, delays its (unmodified) input by one clock cycle.
85
86 BufferedHandshake:
87 ----------------
88
89 nmigen implementation of buffered pipeline stage, based on zipcpu:
90 https://zipcpu.com/blog/2017/08/14/strategies-for-pipelining.html
91
92 this module requires quite a bit of thought to understand how it works
93 (and why it is needed in the first place). reading the above is
94 *strongly* recommended.
95
96 unlike john dawson's IEEE754 FPU STB/ACK signalling, which requires
97 the STB / ACK signals to raise and lower (on separate clocks) before
98 data may proceeed (thus only allowing one piece of data to proceed
99 on *ALTERNATE* cycles), the signalling here is a true pipeline
100 where data will flow on *every* clock when the conditions are right.
101
102 input acceptance conditions are when:
103 * incoming previous-stage strobe (p.valid_i) is HIGH
104 * outgoing previous-stage ready (p.ready_o) is LOW
105
106 output transmission conditions are when:
107 * outgoing next-stage strobe (n.valid_o) is HIGH
108 * outgoing next-stage ready (n.ready_i) is LOW
109
110 the tricky bit is when the input has valid data and the output is not
111 ready to accept it. if it wasn't for the clock synchronisation, it
112 would be possible to tell the input "hey don't send that data, we're
113 not ready". unfortunately, it's not possible to "change the past":
114 the previous stage *has no choice* but to pass on its data.
115
116 therefore, the incoming data *must* be accepted - and stored: that
117 is the responsibility / contract that this stage *must* accept.
118 on the same clock, it's possible to tell the input that it must
119 not send any more data. this is the "stall" condition.
120
121 we now effectively have *two* possible pieces of data to "choose" from:
122 the buffered data, and the incoming data. the decision as to which
123 to process and output is based on whether we are in "stall" or not.
124 i.e. when the next stage is no longer ready, the output comes from
125 the buffer if a stall had previously occurred, otherwise it comes
126 direct from processing the input.
127
128 this allows us to respect a synchronous "travelling STB" with what
129 dan calls a "buffered handshake".
130
131 it's quite a complex state machine!
132
133 SimpleHandshake
134 ---------------
135
136 Synchronised pipeline, Based on:
137 https://github.com/ZipCPU/dbgbus/blob/master/hexbus/rtl/hbdeword.v
138 """
139
140 from nmigen import Signal, Mux, Module, Elaboratable, Const
141 from nmigen.cli import verilog, rtlil
142 from nmigen.hdl.rec import Record
143
144 from nmutil.queue import Queue
145 import inspect
146
147 from nmutil.iocontrol import (PrevControl, NextControl, Object, RecordObject)
148 from nmutil.stageapi import (_spec, StageCls, Stage, StageChain, StageHelper)
149 from nmutil import nmoperator
150
151
152 class RecordBasedStage(Stage):
153 """ convenience class which provides a Records-based layout.
154 honestly it's a lot easier just to create a direct Records-based
155 class (see ExampleAddRecordStage)
156 """
157 def __init__(self, in_shape, out_shape, processfn, setupfn=None):
158 self.in_shape = in_shape
159 self.out_shape = out_shape
160 self.__process = processfn
161 self.__setup = setupfn
162 def ispec(self): return Record(self.in_shape)
163 def ospec(self): return Record(self.out_shape)
164 def process(seif, i): return self.__process(i)
165 def setup(seif, m, i): return self.__setup(m, i)
166
167
168 class PassThroughStage(StageCls):
169 """ a pass-through stage with its input data spec identical to its output,
170 and "passes through" its data from input to output (does nothing).
171
172 use this basically to explicitly make any data spec Stage-compliant.
173 (many APIs would potentially use a static "wrap" method in e.g.
174 StageCls to achieve a similar effect)
175 """
176 def __init__(self, iospecfn): self.iospecfn = iospecfn
177 def ispec(self): return self.iospecfn()
178 def ospec(self): return self.iospecfn()
179
180
181 class ControlBase(StageHelper, Elaboratable):
182 """ Common functions for Pipeline API. Note: a "pipeline stage" only
183 exists (conceptually) when a ControlBase derivative is handed
184 a Stage (combinatorial block)
185
186 NOTE: ControlBase derives from StageHelper, making it accidentally
187 compliant with the Stage API. Using those functions directly
188 *BYPASSES* a ControlBase instance ready/valid signalling, which
189 clearly should not be done without a really, really good reason.
190 """
191 def __init__(self, stage=None, in_multi=None, stage_ctl=False, maskwid=0):
192 """ Base class containing ready/valid/data to previous and next stages
193
194 * p: contains ready/valid to the previous stage
195 * n: contains ready/valid to the next stage
196
197 Except when calling Controlbase.connect(), user must also:
198 * add data_i member to PrevControl (p) and
199 * add data_o member to NextControl (n)
200 Calling ControlBase._new_data is a good way to do that.
201 """
202 print ("ControlBase", self, stage, in_multi, stage_ctl)
203 StageHelper.__init__(self, stage)
204
205 # set up input and output IO ACK (prev/next ready/valid)
206 self.p = PrevControl(in_multi, stage_ctl, maskwid=maskwid)
207 self.n = NextControl(stage_ctl, maskwid=maskwid)
208
209 # set up the input and output data
210 if stage is not None:
211 self._new_data("data")
212
213 def _new_data(self, name):
214 """ allocates new data_i and data_o
215 """
216 self.p.data_i, self.n.data_o = self.new_specs(name)
217
218 @property
219 def data_r(self):
220 return self.process(self.p.data_i)
221
222 def connect_to_next(self, nxt):
223 """ helper function to connect to the next stage data/valid/ready.
224 """
225 return self.n.connect_to_next(nxt.p)
226
227 def _connect_in(self, prev):
228 """ internal helper function to connect stage to an input source.
229 do not use to connect stage-to-stage!
230 """
231 return self.p._connect_in(prev.p)
232
233 def _connect_out(self, nxt):
234 """ internal helper function to connect stage to an output source.
235 do not use to connect stage-to-stage!
236 """
237 return self.n._connect_out(nxt.n)
238
239 def connect(self, pipechain):
240 """ connects a chain (list) of Pipeline instances together and
241 links them to this ControlBase instance:
242
243 in <----> self <---> out
244 | ^
245 v |
246 [pipe1, pipe2, pipe3, pipe4]
247 | ^ | ^ | ^
248 v | v | v |
249 out---in out--in out---in
250
251 Also takes care of allocating data_i/data_o, by looking up
252 the data spec for each end of the pipechain. i.e It is NOT
253 necessary to allocate self.p.data_i or self.n.data_o manually:
254 this is handled AUTOMATICALLY, here.
255
256 Basically this function is the direct equivalent of StageChain,
257 except that unlike StageChain, the Pipeline logic is followed.
258
259 Just as StageChain presents an object that conforms to the
260 Stage API from a list of objects that also conform to the
261 Stage API, an object that calls this Pipeline connect function
262 has the exact same pipeline API as the list of pipline objects
263 it is called with.
264
265 Thus it becomes possible to build up larger chains recursively.
266 More complex chains (multi-input, multi-output) will have to be
267 done manually.
268
269 Argument:
270
271 * :pipechain: - a sequence of ControlBase-derived classes
272 (must be one or more in length)
273
274 Returns:
275
276 * a list of eq assignments that will need to be added in
277 an elaborate() to m.d.comb
278 """
279 assert len(pipechain) > 0, "pipechain must be non-zero length"
280 assert self.stage is None, "do not use connect with a stage"
281 eqs = [] # collated list of assignment statements
282
283 # connect inter-chain
284 for i in range(len(pipechain)-1):
285 pipe1 = pipechain[i] # earlier
286 pipe2 = pipechain[i+1] # later (by 1)
287 eqs += pipe1.connect_to_next(pipe2) # earlier n to later p
288
289 # connect front and back of chain to ourselves
290 front = pipechain[0] # first in chain
291 end = pipechain[-1] # last in chain
292 self.set_specs(front, end) # sets up ispec/ospec functions
293 self._new_data("chain") # NOTE: REPLACES existing data
294 eqs += front._connect_in(self) # front p to our p
295 eqs += end._connect_out(self) # end n to our n
296
297 return eqs
298
299 def set_input(self, i):
300 """ helper function to set the input data (used in unit tests)
301 """
302 return nmoperator.eq(self.p.data_i, i)
303
304 def __iter__(self):
305 yield from self.p # yields ready/valid/data (data also gets yielded)
306 yield from self.n # ditto
307
308 def ports(self):
309 return list(self)
310
311 def elaborate(self, platform):
312 """ handles case where stage has dynamic ready/valid functions
313 """
314 m = Module()
315 m.submodules.p = self.p
316 m.submodules.n = self.n
317
318 self.setup(m, self.p.data_i)
319
320 if not self.p.stage_ctl:
321 return m
322
323 # intercept the previous (outgoing) "ready", combine with stage ready
324 m.d.comb += self.p.s_ready_o.eq(self.p._ready_o & self.stage.d_ready)
325
326 # intercept the next (incoming) "ready" and combine it with data valid
327 sdv = self.stage.d_valid(self.n.ready_i)
328 m.d.comb += self.n.d_valid.eq(self.n.ready_i & sdv)
329
330 return m
331
332
333 class BufferedHandshake(ControlBase):
334 """ buffered pipeline stage. data and strobe signals travel in sync.
335 if ever the input is ready and the output is not, processed data
336 is shunted in a temporary register.
337
338 Argument: stage. see Stage API above
339
340 stage-1 p.valid_i >>in stage n.valid_o out>> stage+1
341 stage-1 p.ready_o <<out stage n.ready_i <<in stage+1
342 stage-1 p.data_i >>in stage n.data_o out>> stage+1
343 | |
344 process --->----^
345 | |
346 +-- r_data ->-+
347
348 input data p.data_i is read (only), is processed and goes into an
349 intermediate result store [process()]. this is updated combinatorially.
350
351 in a non-stall condition, the intermediate result will go into the
352 output (update_output). however if ever there is a stall, it goes
353 into r_data instead [update_buffer()].
354
355 when the non-stall condition is released, r_data is the first
356 to be transferred to the output [flush_buffer()], and the stall
357 condition cleared.
358
359 on the next cycle (as long as stall is not raised again) the
360 input may begin to be processed and transferred directly to output.
361 """
362
363 def elaborate(self, platform):
364 self.m = ControlBase.elaborate(self, platform)
365
366 result = _spec(self.stage.ospec, "r_tmp")
367 r_data = _spec(self.stage.ospec, "r_data")
368
369 # establish some combinatorial temporaries
370 o_n_validn = Signal(reset_less=True)
371 n_ready_i = Signal(reset_less=True, name="n_i_rdy_data")
372 nir_por = Signal(reset_less=True)
373 nir_por_n = Signal(reset_less=True)
374 p_valid_i = Signal(reset_less=True)
375 nir_novn = Signal(reset_less=True)
376 nirn_novn = Signal(reset_less=True)
377 por_pivn = Signal(reset_less=True)
378 npnn = Signal(reset_less=True)
379 self.m.d.comb += [p_valid_i.eq(self.p.valid_i_test),
380 o_n_validn.eq(~self.n.valid_o),
381 n_ready_i.eq(self.n.ready_i_test),
382 nir_por.eq(n_ready_i & self.p._ready_o),
383 nir_por_n.eq(n_ready_i & ~self.p._ready_o),
384 nir_novn.eq(n_ready_i | o_n_validn),
385 nirn_novn.eq(~n_ready_i & o_n_validn),
386 npnn.eq(nir_por | nirn_novn),
387 por_pivn.eq(self.p._ready_o & ~p_valid_i)
388 ]
389
390 # store result of processing in combinatorial temporary
391 self.m.d.comb += nmoperator.eq(result, self.data_r)
392
393 # if not in stall condition, update the temporary register
394 with self.m.If(self.p.ready_o): # not stalled
395 self.m.d.sync += nmoperator.eq(r_data, result) # update buffer
396
397 # data pass-through conditions
398 with self.m.If(npnn):
399 data_o = self._postprocess(result) # XXX TBD, does nothing right now
400 self.m.d.sync += [self.n.valid_o.eq(p_valid_i), # valid if p_valid
401 nmoperator.eq(self.n.data_o, data_o), # update out
402 ]
403 # buffer flush conditions (NOTE: can override data passthru conditions)
404 with self.m.If(nir_por_n): # not stalled
405 # Flush the [already processed] buffer to the output port.
406 data_o = self._postprocess(r_data) # XXX TBD, does nothing right now
407 self.m.d.sync += [self.n.valid_o.eq(1), # reg empty
408 nmoperator.eq(self.n.data_o, data_o), # flush
409 ]
410 # output ready conditions
411 self.m.d.sync += self.p._ready_o.eq(nir_novn | por_pivn)
412
413 return self.m
414
415
416 class MaskNoDelayCancellable(ControlBase):
417 """ Mask-activated Cancellable pipeline (that does not respect "ready")
418
419 Based on (identical behaviour to) SimpleHandshake.
420 TODO: decide whether to merge *into* SimpleHandshake.
421
422 Argument: stage. see Stage API above
423
424 stage-1 p.valid_i >>in stage n.valid_o out>> stage+1
425 stage-1 p.ready_o <<out stage n.ready_i <<in stage+1
426 stage-1 p.data_i >>in stage n.data_o out>> stage+1
427 | |
428 +--process->--^
429 """
430 def __init__(self, stage, maskwid, in_multi=None, stage_ctl=False):
431 ControlBase.__init__(self, stage, in_multi, stage_ctl, maskwid)
432
433 def elaborate(self, platform):
434 self.m = m = ControlBase.elaborate(self, platform)
435
436 # store result of processing in combinatorial temporary
437 result = _spec(self.stage.ospec, "r_tmp")
438 m.d.comb += nmoperator.eq(result, self.data_r)
439
440 # establish if the data should be passed on. cancellation is
441 # a global signal.
442 # XXX EXCEPTIONAL CIRCUMSTANCES: inspection of the data payload
443 # is NOT "normal" for the Stage API.
444 p_valid_i = Signal(reset_less=True)
445 #print ("self.p.data_i", self.p.data_i)
446 maskedout = Signal(len(self.p.mask_i), reset_less=True)
447 m.d.comb += maskedout.eq(self.p.mask_i & ~self.p.stop_i)
448 m.d.comb += p_valid_i.eq(maskedout.bool())
449
450 # if idmask nonzero, mask gets passed on (and register set).
451 # register is left as-is if idmask is zero, but out-mask is set to zero
452 # note however: only the *uncancelled* mask bits get passed on
453 m.d.sync += self.n.valid_o.eq(p_valid_i)
454 m.d.sync += self.n.mask_o.eq(Mux(p_valid_i, maskedout, 0))
455 with m.If(p_valid_i):
456 data_o = self._postprocess(result) # XXX TBD, does nothing right now
457 m.d.sync += nmoperator.eq(self.n.data_o, data_o) # update output
458
459 # output valid if
460 # input always "ready"
461 #m.d.comb += self.p._ready_o.eq(self.n.ready_i_test)
462 m.d.comb += self.p._ready_o.eq(Const(1))
463
464 # always pass on stop (as combinatorial: single signal)
465 m.d.comb += self.n.stop_o.eq(self.p.stop_i)
466
467 return self.m
468
469
470 class MaskCancellable(ControlBase):
471 """ Mask-activated Cancellable pipeline
472
473 Arguments:
474
475 * stage. see Stage API above
476 * maskwid - sets up cancellation capability (mask and stop).
477 * in_multi
478 * stage_ctl
479 * dynamic - allows switching from sync to combinatorial (passthrough)
480 USE WITH CARE. will need the entire pipe to be quiescent
481 before switching, otherwise data WILL be destroyed.
482
483 stage-1 p.valid_i >>in stage n.valid_o out>> stage+1
484 stage-1 p.ready_o <<out stage n.ready_i <<in stage+1
485 stage-1 p.data_i >>in stage n.data_o out>> stage+1
486 | |
487 +--process->--^
488 """
489 def __init__(self, stage, maskwid, in_multi=None, stage_ctl=False,
490 dynamic=False):
491 ControlBase.__init__(self, stage, in_multi, stage_ctl, maskwid)
492 self.dynamic = dynamic
493 if dynamic:
494 self.latchmode = Signal()
495 else:
496 self.latchmode = Const(1)
497
498 def elaborate(self, platform):
499 self.m = m = ControlBase.elaborate(self, platform)
500
501 mask_r = Signal(len(self.p.mask_i), reset_less=True)
502 data_r = _spec(self.stage.ospec, "data_r")
503 m.d.comb += nmoperator.eq(data_r, self._postprocess(self.data_r))
504
505 with m.If(self.latchmode):
506 r_busy = Signal()
507 r_latch = _spec(self.stage.ospec, "r_latch")
508
509 # establish if the data should be passed on. cancellation is
510 # a global signal.
511 p_valid_i = Signal(reset_less=True)
512 #print ("self.p.data_i", self.p.data_i)
513 maskedout = Signal(len(self.p.mask_i), reset_less=True)
514 m.d.comb += maskedout.eq(self.p.mask_i & ~self.p.stop_i)
515
516 # establish some combinatorial temporaries
517 n_ready_i = Signal(reset_less=True, name="n_i_rdy_data")
518 p_valid_i_p_ready_o = Signal(reset_less=True)
519 m.d.comb += [p_valid_i.eq(self.p.valid_i_test & maskedout.bool()),
520 n_ready_i.eq(self.n.ready_i_test),
521 p_valid_i_p_ready_o.eq(p_valid_i & self.p.ready_o),
522 ]
523
524 # if idmask nonzero, mask gets passed on (and register set).
525 # register is left as-is if idmask is zero, but out-mask is set to
526 # zero
527 # note however: only the *uncancelled* mask bits get passed on
528 m.d.sync += mask_r.eq(Mux(p_valid_i, maskedout, 0))
529 m.d.comb += self.n.mask_o.eq(mask_r)
530
531 # always pass on stop (as combinatorial: single signal)
532 m.d.comb += self.n.stop_o.eq(self.p.stop_i)
533
534 stor = Signal(reset_less=True)
535 m.d.comb += stor.eq(p_valid_i_p_ready_o | n_ready_i)
536 with m.If(stor):
537 # store result of processing in combinatorial temporary
538 m.d.sync += nmoperator.eq(r_latch, data_r)
539
540 # previous valid and ready
541 with m.If(p_valid_i_p_ready_o):
542 m.d.sync += r_busy.eq(1) # output valid
543 # previous invalid or not ready, however next is accepting
544 with m.Elif(n_ready_i):
545 m.d.sync += r_busy.eq(0) # ...so set output invalid
546
547 # output set combinatorially from latch
548 m.d.comb += nmoperator.eq(self.n.data_o, r_latch)
549
550 m.d.comb += self.n.valid_o.eq(r_busy)
551 # if next is ready, so is previous
552 m.d.comb += self.p._ready_o.eq(n_ready_i)
553
554 with m.Else():
555 # pass everything straight through. p connected to n: data,
556 # valid, mask, everything. this is "effectively" just a
557 # StageChain: MaskCancellable is doing "nothing" except
558 # combinatorially passing everything through
559 # (except now it's *dynamically selectable* whether to do that)
560 m.d.comb += self.n.valid_o.eq(self.p.valid_i_test)
561 m.d.comb += self.p._ready_o.eq(self.n.ready_i_test)
562 m.d.comb += self.n.stop_o.eq(self.p.stop_i)
563 m.d.comb += self.n.mask_o.eq(self.p.mask_i)
564 m.d.comb += nmoperator.eq(self.n.data_o, data_r)
565
566 return self.m
567
568
569 class SimpleHandshake(ControlBase):
570 """ simple handshake control. data and strobe signals travel in sync.
571 implements the protocol used by Wishbone and AXI4.
572
573 Argument: stage. see Stage API above
574
575 stage-1 p.valid_i >>in stage n.valid_o out>> stage+1
576 stage-1 p.ready_o <<out stage n.ready_i <<in stage+1
577 stage-1 p.data_i >>in stage n.data_o out>> stage+1
578 | |
579 +--process->--^
580 Truth Table
581
582 Inputs Temporary Output Data
583 ------- ---------- ----- ----
584 P P N N PiV& ~NiR& N P
585 i o i o PoR NoV o o
586 V R R V V R
587
588 ------- - - - -
589 0 0 0 0 0 0 >0 0 reg
590 0 0 0 1 0 1 >1 0 reg
591 0 0 1 0 0 0 0 1 process(data_i)
592 0 0 1 1 0 0 0 1 process(data_i)
593 ------- - - - -
594 0 1 0 0 0 0 >0 0 reg
595 0 1 0 1 0 1 >1 0 reg
596 0 1 1 0 0 0 0 1 process(data_i)
597 0 1 1 1 0 0 0 1 process(data_i)
598 ------- - - - -
599 1 0 0 0 0 0 >0 0 reg
600 1 0 0 1 0 1 >1 0 reg
601 1 0 1 0 0 0 0 1 process(data_i)
602 1 0 1 1 0 0 0 1 process(data_i)
603 ------- - - - -
604 1 1 0 0 1 0 1 0 process(data_i)
605 1 1 0 1 1 1 1 0 process(data_i)
606 1 1 1 0 1 0 1 1 process(data_i)
607 1 1 1 1 1 0 1 1 process(data_i)
608 ------- - - - -
609 """
610
611 def elaborate(self, platform):
612 self.m = m = ControlBase.elaborate(self, platform)
613
614 r_busy = Signal()
615 result = _spec(self.stage.ospec, "r_tmp")
616
617 # establish some combinatorial temporaries
618 n_ready_i = Signal(reset_less=True, name="n_i_rdy_data")
619 p_valid_i_p_ready_o = Signal(reset_less=True)
620 p_valid_i = Signal(reset_less=True)
621 m.d.comb += [p_valid_i.eq(self.p.valid_i_test),
622 n_ready_i.eq(self.n.ready_i_test),
623 p_valid_i_p_ready_o.eq(p_valid_i & self.p.ready_o),
624 ]
625
626 # store result of processing in combinatorial temporary
627 m.d.comb += nmoperator.eq(result, self.data_r)
628
629 # previous valid and ready
630 with m.If(p_valid_i_p_ready_o):
631 data_o = self._postprocess(result) # XXX TBD, does nothing right now
632 m.d.sync += [r_busy.eq(1), # output valid
633 nmoperator.eq(self.n.data_o, data_o), # update output
634 ]
635 # previous invalid or not ready, however next is accepting
636 with m.Elif(n_ready_i):
637 data_o = self._postprocess(result) # XXX TBD, does nothing right now
638 m.d.sync += [nmoperator.eq(self.n.data_o, data_o)]
639 # TODO: could still send data here (if there was any)
640 #m.d.sync += self.n.valid_o.eq(0) # ...so set output invalid
641 m.d.sync += r_busy.eq(0) # ...so set output invalid
642
643 m.d.comb += self.n.valid_o.eq(r_busy)
644 # if next is ready, so is previous
645 m.d.comb += self.p._ready_o.eq(n_ready_i)
646
647 return self.m
648
649
650 class UnbufferedPipeline(ControlBase):
651 """ A simple pipeline stage with single-clock synchronisation
652 and two-way valid/ready synchronised signalling.
653
654 Note that a stall in one stage will result in the entire pipeline
655 chain stalling.
656
657 Also that unlike BufferedHandshake, the valid/ready signalling does NOT
658 travel synchronously with the data: the valid/ready signalling
659 combines in a *combinatorial* fashion. Therefore, a long pipeline
660 chain will lengthen propagation delays.
661
662 Argument: stage. see Stage API, above
663
664 stage-1 p.valid_i >>in stage n.valid_o out>> stage+1
665 stage-1 p.ready_o <<out stage n.ready_i <<in stage+1
666 stage-1 p.data_i >>in stage n.data_o out>> stage+1
667 | |
668 r_data result
669 | |
670 +--process ->-+
671
672 Attributes:
673 -----------
674 p.data_i : StageInput, shaped according to ispec
675 The pipeline input
676 p.data_o : StageOutput, shaped according to ospec
677 The pipeline output
678 r_data : input_shape according to ispec
679 A temporary (buffered) copy of a prior (valid) input.
680 This is HELD if the output is not ready. It is updated
681 SYNCHRONOUSLY.
682 result: output_shape according to ospec
683 The output of the combinatorial logic. it is updated
684 COMBINATORIALLY (no clock dependence).
685
686 Truth Table
687
688 Inputs Temp Output Data
689 ------- - ----- ----
690 P P N N ~NiR& N P
691 i o i o NoV o o
692 V R R V V R
693
694 ------- - - -
695 0 0 0 0 0 0 1 reg
696 0 0 0 1 1 1 0 reg
697 0 0 1 0 0 0 1 reg
698 0 0 1 1 0 0 1 reg
699 ------- - - -
700 0 1 0 0 0 0 1 reg
701 0 1 0 1 1 1 0 reg
702 0 1 1 0 0 0 1 reg
703 0 1 1 1 0 0 1 reg
704 ------- - - -
705 1 0 0 0 0 1 1 reg
706 1 0 0 1 1 1 0 reg
707 1 0 1 0 0 1 1 reg
708 1 0 1 1 0 1 1 reg
709 ------- - - -
710 1 1 0 0 0 1 1 process(data_i)
711 1 1 0 1 1 1 0 process(data_i)
712 1 1 1 0 0 1 1 process(data_i)
713 1 1 1 1 0 1 1 process(data_i)
714 ------- - - -
715
716 Note: PoR is *NOT* involved in the above decision-making.
717 """
718
719 def elaborate(self, platform):
720 self.m = m = ControlBase.elaborate(self, platform)
721
722 data_valid = Signal() # is data valid or not
723 r_data = _spec(self.stage.ospec, "r_tmp") # output type
724
725 # some temporaries
726 p_valid_i = Signal(reset_less=True)
727 pv = Signal(reset_less=True)
728 buf_full = Signal(reset_less=True)
729 m.d.comb += p_valid_i.eq(self.p.valid_i_test)
730 m.d.comb += pv.eq(self.p.valid_i & self.p.ready_o)
731 m.d.comb += buf_full.eq(~self.n.ready_i_test & data_valid)
732
733 m.d.comb += self.n.valid_o.eq(data_valid)
734 m.d.comb += self.p._ready_o.eq(~data_valid | self.n.ready_i_test)
735 m.d.sync += data_valid.eq(p_valid_i | buf_full)
736
737 with m.If(pv):
738 m.d.sync += nmoperator.eq(r_data, self.data_r)
739 data_o = self._postprocess(r_data) # XXX TBD, does nothing right now
740 m.d.comb += nmoperator.eq(self.n.data_o, data_o)
741
742 return self.m
743
744
745 class UnbufferedPipeline2(ControlBase):
746 """ A simple pipeline stage with single-clock synchronisation
747 and two-way valid/ready synchronised signalling.
748
749 Note that a stall in one stage will result in the entire pipeline
750 chain stalling.
751
752 Also that unlike BufferedHandshake, the valid/ready signalling does NOT
753 travel synchronously with the data: the valid/ready signalling
754 combines in a *combinatorial* fashion. Therefore, a long pipeline
755 chain will lengthen propagation delays.
756
757 Argument: stage. see Stage API, above
758
759 stage-1 p.valid_i >>in stage n.valid_o out>> stage+1
760 stage-1 p.ready_o <<out stage n.ready_i <<in stage+1
761 stage-1 p.data_i >>in stage n.data_o out>> stage+1
762 | | |
763 +- process-> buf <-+
764 Attributes:
765 -----------
766 p.data_i : StageInput, shaped according to ispec
767 The pipeline input
768 p.data_o : StageOutput, shaped according to ospec
769 The pipeline output
770 buf : output_shape according to ospec
771 A temporary (buffered) copy of a valid output
772 This is HELD if the output is not ready. It is updated
773 SYNCHRONOUSLY.
774
775 Inputs Temp Output Data
776 ------- - -----
777 P P N N ~NiR& N P (buf_full)
778 i o i o NoV o o
779 V R R V V R
780
781 ------- - - -
782 0 0 0 0 0 0 1 process(data_i)
783 0 0 0 1 1 1 0 reg (odata, unchanged)
784 0 0 1 0 0 0 1 process(data_i)
785 0 0 1 1 0 0 1 process(data_i)
786 ------- - - -
787 0 1 0 0 0 0 1 process(data_i)
788 0 1 0 1 1 1 0 reg (odata, unchanged)
789 0 1 1 0 0 0 1 process(data_i)
790 0 1 1 1 0 0 1 process(data_i)
791 ------- - - -
792 1 0 0 0 0 1 1 process(data_i)
793 1 0 0 1 1 1 0 reg (odata, unchanged)
794 1 0 1 0 0 1 1 process(data_i)
795 1 0 1 1 0 1 1 process(data_i)
796 ------- - - -
797 1 1 0 0 0 1 1 process(data_i)
798 1 1 0 1 1 1 0 reg (odata, unchanged)
799 1 1 1 0 0 1 1 process(data_i)
800 1 1 1 1 0 1 1 process(data_i)
801 ------- - - -
802
803 Note: PoR is *NOT* involved in the above decision-making.
804 """
805
806 def elaborate(self, platform):
807 self.m = m = ControlBase.elaborate(self, platform)
808
809 buf_full = Signal() # is data valid or not
810 buf = _spec(self.stage.ospec, "r_tmp") # output type
811
812 # some temporaries
813 p_valid_i = Signal(reset_less=True)
814 m.d.comb += p_valid_i.eq(self.p.valid_i_test)
815
816 m.d.comb += self.n.valid_o.eq(buf_full | p_valid_i)
817 m.d.comb += self.p._ready_o.eq(~buf_full)
818 m.d.sync += buf_full.eq(~self.n.ready_i_test & self.n.valid_o)
819
820 data_o = Mux(buf_full, buf, self.data_r)
821 data_o = self._postprocess(data_o) # XXX TBD, does nothing right now
822 m.d.comb += nmoperator.eq(self.n.data_o, data_o)
823 m.d.sync += nmoperator.eq(buf, self.n.data_o)
824
825 return self.m
826
827
828 class PassThroughHandshake(ControlBase):
829 """ A control block that delays by one clock cycle.
830
831 Inputs Temporary Output Data
832 ------- ------------------ ----- ----
833 P P N N PiV& PiV| NiR| pvr N P (pvr)
834 i o i o PoR ~PoR ~NoV o o
835 V R R V V R
836
837 ------- - - - - - -
838 0 0 0 0 0 1 1 0 1 1 odata (unchanged)
839 0 0 0 1 0 1 0 0 1 0 odata (unchanged)
840 0 0 1 0 0 1 1 0 1 1 odata (unchanged)
841 0 0 1 1 0 1 1 0 1 1 odata (unchanged)
842 ------- - - - - - -
843 0 1 0 0 0 0 1 0 0 1 odata (unchanged)
844 0 1 0 1 0 0 0 0 0 0 odata (unchanged)
845 0 1 1 0 0 0 1 0 0 1 odata (unchanged)
846 0 1 1 1 0 0 1 0 0 1 odata (unchanged)
847 ------- - - - - - -
848 1 0 0 0 0 1 1 1 1 1 process(in)
849 1 0 0 1 0 1 0 0 1 0 odata (unchanged)
850 1 0 1 0 0 1 1 1 1 1 process(in)
851 1 0 1 1 0 1 1 1 1 1 process(in)
852 ------- - - - - - -
853 1 1 0 0 1 1 1 1 1 1 process(in)
854 1 1 0 1 1 1 0 0 1 0 odata (unchanged)
855 1 1 1 0 1 1 1 1 1 1 process(in)
856 1 1 1 1 1 1 1 1 1 1 process(in)
857 ------- - - - - - -
858
859 """
860
861 def elaborate(self, platform):
862 self.m = m = ControlBase.elaborate(self, platform)
863
864 r_data = _spec(self.stage.ospec, "r_tmp") # output type
865
866 # temporaries
867 p_valid_i = Signal(reset_less=True)
868 pvr = Signal(reset_less=True)
869 m.d.comb += p_valid_i.eq(self.p.valid_i_test)
870 m.d.comb += pvr.eq(p_valid_i & self.p.ready_o)
871
872 m.d.comb += self.p.ready_o.eq(~self.n.valid_o | self.n.ready_i_test)
873 m.d.sync += self.n.valid_o.eq(p_valid_i | ~self.p.ready_o)
874
875 odata = Mux(pvr, self.data_r, r_data)
876 m.d.sync += nmoperator.eq(r_data, odata)
877 r_data = self._postprocess(r_data) # XXX TBD, does nothing right now
878 m.d.comb += nmoperator.eq(self.n.data_o, r_data)
879
880 return m
881
882
883 class RegisterPipeline(UnbufferedPipeline):
884 """ A pipeline stage that delays by one clock cycle, creating a
885 sync'd latch out of data_o and valid_o as an indirect byproduct
886 of using PassThroughStage
887 """
888 def __init__(self, iospecfn):
889 UnbufferedPipeline.__init__(self, PassThroughStage(iospecfn))
890
891
892 class FIFOControl(ControlBase):
893 """ FIFO Control. Uses Queue to store data, coincidentally
894 happens to have same valid/ready signalling as Stage API.
895
896 data_i -> fifo.din -> FIFO -> fifo.dout -> data_o
897 """
898 def __init__(self, depth, stage, in_multi=None, stage_ctl=False,
899 fwft=True, pipe=False):
900 """ FIFO Control
901
902 * :depth: number of entries in the FIFO
903 * :stage: data processing block
904 * :fwft: first word fall-thru mode (non-fwft introduces delay)
905 * :pipe: specifies pipe mode.
906
907 when fwft = True it indicates that transfers may occur
908 combinatorially through stage processing in the same clock cycle.
909 This requires that the Stage be a Moore FSM:
910 https://en.wikipedia.org/wiki/Moore_machine
911
912 when fwft = False it indicates that all output signals are
913 produced only from internal registers or memory, i.e. that the
914 Stage is a Mealy FSM:
915 https://en.wikipedia.org/wiki/Mealy_machine
916
917 data is processed (and located) as follows:
918
919 self.p self.stage temp fn temp fn temp fp self.n
920 data_i->process()->result->cat->din.FIFO.dout->cat(data_o)
921
922 yes, really: cat produces a Cat() which can be assigned to.
923 this is how the FIFO gets de-catted without needing a de-cat
924 function
925 """
926 self.fwft = fwft
927 self.pipe = pipe
928 self.fdepth = depth
929 ControlBase.__init__(self, stage, in_multi, stage_ctl)
930
931 def elaborate(self, platform):
932 self.m = m = ControlBase.elaborate(self, platform)
933
934 # make a FIFO with a signal of equal width to the data_o.
935 (fwidth, _) = nmoperator.shape(self.n.data_o)
936 fifo = Queue(fwidth, self.fdepth, fwft=self.fwft, pipe=self.pipe)
937 m.submodules.fifo = fifo
938
939 def processfn(data_i):
940 # store result of processing in combinatorial temporary
941 result = _spec(self.stage.ospec, "r_temp")
942 m.d.comb += nmoperator.eq(result, self.process(data_i))
943 return nmoperator.cat(result)
944
945 ## prev: make the FIFO (Queue object) "look" like a PrevControl...
946 m.submodules.fp = fp = PrevControl()
947 fp.valid_i, fp._ready_o, fp.data_i = fifo.we, fifo.writable, fifo.din
948 m.d.comb += fp._connect_in(self.p, fn=processfn)
949
950 # next: make the FIFO (Queue object) "look" like a NextControl...
951 m.submodules.fn = fn = NextControl()
952 fn.valid_o, fn.ready_i, fn.data_o = fifo.readable, fifo.re, fifo.dout
953 connections = fn._connect_out(self.n, fn=nmoperator.cat)
954 valid_eq, ready_eq, data_o = connections
955
956 # ok ok so we can't just do the ready/valid eqs straight:
957 # first 2 from connections are the ready/valid, 3rd is data.
958 if self.fwft:
959 m.d.comb += [valid_eq, ready_eq] # combinatorial on next ready/valid
960 else:
961 m.d.sync += [valid_eq, ready_eq] # non-fwft mode needs sync
962 data_o = self._postprocess(data_o) # XXX TBD, does nothing right now
963 m.d.comb += data_o
964
965 return m
966
967
968 # aka "RegStage".
969 class UnbufferedPipeline(FIFOControl):
970 def __init__(self, stage, in_multi=None, stage_ctl=False):
971 FIFOControl.__init__(self, 1, stage, in_multi, stage_ctl,
972 fwft=True, pipe=False)
973
974 # aka "BreakReadyStage" XXX had to set fwft=True to get it to work
975 class PassThroughHandshake(FIFOControl):
976 def __init__(self, stage, in_multi=None, stage_ctl=False):
977 FIFOControl.__init__(self, 1, stage, in_multi, stage_ctl,
978 fwft=True, pipe=True)
979
980 # this is *probably* BufferedHandshake, although test #997 now succeeds.
981 class BufferedHandshake(FIFOControl):
982 def __init__(self, stage, in_multi=None, stage_ctl=False):
983 FIFOControl.__init__(self, 2, stage, in_multi, stage_ctl,
984 fwft=True, pipe=False)
985
986
987 """
988 # this is *probably* SimpleHandshake (note: memory cell size=0)
989 class SimpleHandshake(FIFOControl):
990 def __init__(self, stage, in_multi=None, stage_ctl=False):
991 FIFOControl.__init__(self, 0, stage, in_multi, stage_ctl,
992 fwft=True, pipe=False)
993 """