[libre-riscv-dev] Some recent documenting of work performed for tape-out
[libre-riscv-dev.git] / 95 / 818b78af08c4c32311c2c8a5847f49caa203ca
1 Return-path: <libre-riscv-dev-bounces@lists.libre-riscv.org>
2 Envelope-to: publicinbox@libre-riscv.org
3 Delivery-date: Sat, 28 Mar 2020 14:08:34 +0000
4 Received: from localhost ([::1] helo=libre-riscv.org)
5 by libre-riscv.org with esmtp (Exim 4.89)
6 (envelope-from <libre-riscv-dev-bounces@lists.libre-riscv.org>)
7 id 1jIC8W-0004aH-R7; Sat, 28 Mar 2020 14:08:32 +0000
8 Received: from vps2.stafverhaegen.be ([85.10.201.15])
9 by libre-riscv.org with esmtp (Exim 4.89)
10 (envelope-from <staf@fibraservi.eu>) id 1jIC8U-0004aB-Gu
11 for libre-riscv-dev@lists.libre-riscv.org; Sat, 28 Mar 2020 14:08:30 +0000
12 Received: from hpdc7800 (hpdc7800 [10.0.0.1])
13 by vps2.stafverhaegen.be (Postfix) with ESMTP id CAAF811C040B
14 for <libre-riscv-dev@lists.libre-riscv.org>;
15 Sat, 28 Mar 2020 15:08:29 +0100 (CET)
16 Message-ID: <0d35e45bd81eeaecedeb64dc5061c1e33c89630c.camel@fibraservi.eu>
17 From: Staf Verhaegen <staf@fibraservi.eu>
18 To: libre-riscv-dev@lists.libre-riscv.org
19 Date: Sat, 28 Mar 2020 15:08:25 +0100
20 In-Reply-To: <CAPweEDzAtWoU+wc6MTayF1vtKJvrxLLfP-Q1Czea+NX5MgOrfg@mail.gmail.com>
21 References: <CAPweEDx5QCCKxSr1gfuyuw_2D68Ld8fK85bEmmMTZi8S3w2E9g@mail.gmail.com>
22 <29b1a9ecedda151dc9c8da6516c3691dfede62ef.camel@fibraservi.eu>
23 <CAPweEDwfqMczPjg=5Fvt1J_S8nx1YK44XhyBY8H1abuTNF6=xg@mail.gmail.com>
24 <6fa40cb78b3f8c013ca4953ccb4daa5c23e3b501.camel@fibraservi.eu>
25 <CAPweEDxiyTEsneXN65Kq0HsEsdL3wdY=NYayq2tz5egXJNCVfg@mail.gmail.com>
26 <e430ea6587d292166fd58460adf4dfebfad20c6d.camel@fibraservi.eu>
27 <CAPweEDzEvtPYGKvGMvebmQzhJDhSgfvUOVZvB2WXxSbv_ebE8A@mail.gmail.com>
28 <b18283c7e7a93fa8afdef2f0a8679b26e4569528.camel@fibraservi.eu>
29 <CAPweEDwznLD5o6rHfWsSXR-8e1hbAfAB04f5O+YkL6pCwGsNfQ@mail.gmail.com>
30 <6fbfb2a3258be77f4fce69661b283dc31a683f7b.camel@fibraservi.eu>
31 <CAPweEDwf7s=r6bhq6N=VG7QQ1iD4jHYEG6mGvtxL32Uxnhzqwg@mail.gmail.com>
32 <9e44930a0332eff507661e617796b9d0674b0e05.camel@fibraservi.eu>
33 <CAPweEDzAtWoU+wc6MTayF1vtKJvrxLLfP-Q1Czea+NX5MgOrfg@mail.gmail.com>
34 Organization: FibraServi bvba
35 X-Mailer: Evolution 3.28.5 (3.28.5-5.el7)
36 Mime-Version: 1.0
37 X-Content-Filtered-By: Mailman/MimeDel 2.1.23
38 Subject: [libre-riscv-dev] Clock Gating (was cache SRAM organisation)
39 X-BeenThere: libre-riscv-dev@lists.libre-riscv.org
40 X-Mailman-Version: 2.1.23
41 Precedence: list
42 List-Id: Libre-RISCV General Development
43 <libre-riscv-dev.lists.libre-riscv.org>
44 List-Unsubscribe: <http://lists.libre-riscv.org/mailman/options/libre-riscv-dev>,
45 <mailto:libre-riscv-dev-request@lists.libre-riscv.org?subject=unsubscribe>
46 List-Archive: <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/>
47 List-Post: <mailto:libre-riscv-dev@lists.libre-riscv.org>
48 List-Help: <mailto:libre-riscv-dev-request@lists.libre-riscv.org?subject=help>
49 List-Subscribe: <http://lists.libre-riscv.org/mailman/listinfo/libre-riscv-dev>,
50 <mailto:libre-riscv-dev-request@lists.libre-riscv.org?subject=subscribe>
51 Reply-To: Libre-RISCV General Development
52 <libre-riscv-dev@lists.libre-riscv.org>
53 Content-Type: multipart/mixed; boundary="===============5623462699142308802=="
54 Errors-To: libre-riscv-dev-bounces@lists.libre-riscv.org
55 Sender: "libre-riscv-dev" <libre-riscv-dev-bounces@lists.libre-riscv.org>
56
57
58 --===============5623462699142308802==
59 Content-Type: multipart/signed; micalg="pgp-sha1"; protocol="application/pgp-signature";
60 boundary="=-FBhRFmonkZ2xasih2XbI"
61
62
63 --=-FBhRFmonkZ2xasih2XbI
64 Content-Type: text/plain; charset="UTF-8"
65 Content-Transfer-Encoding: quoted-printable
66
67 Luke Kenneth Casson Leighton schreef op vr 27-03-2020 om 10:59 [+0000]:
68 > On Fri, Mar 27, 2020 at 10:36 AM Staf Verhaegen <staf@fibraservi.eu> wrot=
69 e:
70 > > Yes and no, it is the basic functionality of a pipeline :(
71 >=20
72 > yes.
73 > > You have the same latency but can have double the number of operations =
74 in flight.
75 >=20
76 > yes. hence why it is so important to have, because double the numberof o=
77 perations means that we need double the number of Function Unitsin the Depe=
78 ndency Matrix in order to keep the entire out-of-orderengine occupied.
79 > also, double the number of operations in flight means that we needdouble =
80 the number of Branch Prediction Units, and much more complexBPUs at that, j=
81 ust to deal with the (now very likely) scenario ofhaving far more overlappi=
82 ng inner loops "in flight".
83 > all this from just extending the pipeline length(s) from 5 to 10. soit's=
84 not just a "nice-to-have" feature, it's actually really importantto keepin=
85 g the overall size of the chip down.
86
87 There is an (IMO better) alternative for what you are doing with your pass-=
88 through registers and that is clock gating (wikipedia, allaboutcircuits).
89 The principle is that you save power by not clocking the parts of the circu=
90 it that don't have to do any computing. I think this could be a more genera=
91 l way to only enable the stages in your pipeline who actually are doing com=
92 putation.
93 In the above example you would always use a 10 stage pipeline running at 16=
94 00MHz but to mimic the 5-stage pipeline you only submit an operation every =
95 other clock cycle and intermittently enable the odd and even stages in your=
96 pipeline. This way the MUXes are removed from the computation path.
97 Using a shift register it could be easily generalized to only enable the st=
98 ages for which there is an operation going through the pipeline. When an op=
99 eration is submitted you set the first bit in the shift register to enable =
100 the first stage in the pipeline. With each cycle you then shift this bit so=
101 the stage that is needed for the execution of that operation is active.
102 This is generalized power optimization because it means that if you are run=
103 ning a program that only uses integer operations your FPU and GPU with use =
104 almost no power.
105
106 The way to implement it is using EnableInserter. Some untested code how I t=
107 hink it can be done:
108
109 stages_en =3D Signal(10)
110 stage1 =3D EnableInserter(stages_en[0])(Stage1())
111 stage2 =3D EnableInserter(stages_en[1])(Stage2())
112 ...
113
114 m.d.sync +=3D stages_en.eq(Cat(newop, stages_en[0:9]))
115
116 That said I think this feature does not fit in the MVP scope of the October=
117 prototype so that chip should IMO not use clock gating nor the pass-throug=
118 h register feature from the original discussion. Reason is that implementin=
119 g it is easier said than done. Several things need to be done:
120 - You first need a clock gating cell. This is not available in nsxlib and i=
121 s currently not planned to be implemented. I don't want to commit to someth=
122 ing extra for the May test chip tape-out either.
123 - nmigen/yosys needs to properly support clock gating for ASICs. Likely thi=
124 s means work in yosys that insert the clock gates from if clauses in the RT=
125 L.
126 - Your P&R tool (e.g. Coriolis) needs to support the clock gates. It means =
127 your clock tree synthesis (CTS) needs to support more than just buffers in =
128 the clock tree. This is not a simple task and has to be discussed with Jean=
129 -Paul & co.
130
131 greets,
132 Staf.
133
134 --=-FBhRFmonkZ2xasih2XbI--
135
136
137
138 --===============5623462699142308802==
139 Content-Type: text/plain; charset="utf-8"
140 MIME-Version: 1.0
141 Content-Transfer-Encoding: base64
142 Content-Disposition: inline
143
144 X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KbGlicmUtcmlz
145 Y3YtZGV2IG1haWxpbmcgbGlzdApsaWJyZS1yaXNjdi1kZXZAbGlzdHMubGlicmUtcmlzY3Yub3Jn
146 Cmh0dHA6Ly9saXN0cy5saWJyZS1yaXNjdi5vcmcvbWFpbG1hbi9saXN0aW5mby9saWJyZS1yaXNj
147 di1kZXYK
148
149 --===============5623462699142308802==--
150
151
152