Add USB alternatives - need to come back to this
[libreriscv.git] / shakti / m_class / ramanalysis.mdwn
1 # Analysis of Options for Memory Interfaces for a Mobile-class Libre SoC
2
3 This document covers why, according to best risk-reducing and practical
4 issues, DDR3/DDR3L/LPDDR3 is the best option for a mobile-class SoC
5 *at the time of writing*.
6
7 The requirements which minimise risk are:
8
9 * Reasonable power consumption for the target SoC (well below 1.5 watts)
10 power budget for the RAM ICs.
11 * Minimum or equivalent of 700mhz @ 32-bit transfers (so 350mhz clockrate
12 for a total 700mhz DDR @ 32-bit or 175mhz @ 64-bit or 700mhz @ 16-bit)
13 * Mass-volume pricing
14 * High availability
15 * Multiple suppliers
16 * No more than 15 cm^3 board area required for RAM plus routing to SoC
17 (just about covers 4x DDR3 78-pin FBGA ICs, or 4x DDR3 96-pin FBGA ICs).
18 Around 15 cm^3 is quite generous, and is practical for making a credit-card
19 sized SBC with all RAM ICs and the SoC on TOP side of the PCB.
20
21 Each of these will be covered in turn, below. Then, there will be a
22 separate section covering the various types of RAM offerings, including
23 some innovative (research-style) ideas. These are:
24
25 * Package-on-Package (POP)
26 * RAM on-die (known as Multi-Chip Modules)
27 * MCM standard and *non*-standard interfaces (custom-designed)
28 * Standard off-the-shelf die vs custom-made DRAM or SRAM ASIC
29 * DDR1, DDR2, DDR3, DDR4 ....
30
31 # Requirements
32
33 ## Power Consumption
34
35 Lowering the power consumption is simply a practical consideration to keep
36 cost down and make component selection, manufacturing and design of the PCB
37 easier. For example: if the AXP209 can be utilised as the PMIC for the
38 entire product, that is a USD $0.5 part and the layout, amount of current
39 it consumes, and general support including linux drivers makes it an easy
40 choice. On the other hand, if more complex PMIC layouts are required that
41 automatically pushes pricing up, introduces risk and NREs.
42
43 Therefore if the total budget for the entire design can be kept below
44 around 3.5 watts, which translates roughly to around 1.5 watts for the memory
45 and around 1.5 to 2W for the SoC, a lower-cost PMIC can be deployed *and*
46 there is a lot less to worry about when it comes to thermal dissipation.
47
48 Note that from Micron's Technical Note TN-41-01, a single x16 1033mhz
49 DDR3 (not DDR3L) DRAM can consume 436mW on its own. If two of those
50 are deployed to give a 32-bit-wide memory interface @ 1033mhz, that's
51 872mW which is just about acceptable. It would be much better to
52 consider using DDR3L (1.35v instead of 1.5v) as this would lower power
53 consumption roughly on a square law with voltage for an approximate
54 20% drop.
55
56 ## Minimum 700mhz @ 32-bit transfer rates
57
58 This is a practical consideration for delivering reasonable performance
59 and being able to cover 720 / 1080p video playback without stalling.
60 Once decoded from their compressed format, video framebuffers take up
61 an enormous amount of memory bandwidth, which cannot be cached on-chip
62 so has to be written out to RAM and then read back in again. Video
63 (and 3D) therefore have a massive impact on the SoC's performance when
64 using a lower-cost "shared memory bus" architecture.
65
66 1.4 Gigabytes per second of raw reads/writes is therefore a reasonable
67 compromise between development costs, overall system price, running too
68 hot, and running so slow that users start complaining or cannot play
69 certain videos or applications at all.
70 If better than this absolute minimum can be achieved within the power
71 budget that would be great.
72
73 Other options to include are: going for a 64-bit wide total bus bandwidth,
74 which can be achieved with either 4x 16-bit FBGA96 ICs, or 2x 32-bit
75 FBGA168 LPDDR3 ICs. The issue is: that assumes that it's okay to
76 correspondingly increase the number of pins of the SoC by an extra
77 100 on its pincount, in order to cover dual 32-bit DRAM interfaces.
78 Aside from the increased licensing costs and power consumption associated
79 with twin DRAM interfaces, the current proposed SoC is only 290 pins, meaning
80 that it can be done as a 0.8mm pitch BGA that is only around 15mm on
81 a side. That makes it extremely low-cost and very easy to manufacture,
82 even being possible to consider 4-layer PCBs and 10mil drill-holes
83 (very cheap).
84
85 If the pincount were increased to 400 it would be necessary to go to
86 a 0.6mm pin pitch in order to keep the package size down. That then in
87 turn increases manufacturing costs (6-7 mil BGA VIA drill-holes, requiring
88 laser-drilling) and so on. Whilst it seems strange to consider the
89 pin count and pin pitch of an SoC when considering something like the
90 bandwidth of the memory bus, it goes some way to illustrate quite how
91 interconnected everything really is.
92
93 Bottom line: yes you *could* go to dual 32-bit-wide DDR RAM interfaces,
94 but the *production* cost increases in doing so need to be taken into
95 consideration. Some SoCs do actually take only a 16-bit wide DDR RAM
96 interface: these tend not to be very popular (or are used in specialist
97 markets such as smart watches) as the reduction in memory bandwidth tends
98 to go hand-in-hand with ultra-low-power scenarios. Try putting them into
99 the hands of mass-volume end users running general-purpose OSes such as
100 Android and the users only complain and consider their purchase to have
101 been a total waste of money. 32-bit-wide at around 1066mhz seems
102 to be an acceptable compromise on all fronts.
103
104 ## Mass-volume Pricing, High availability, Multiple Suppliers
105
106 These are all important inter-related considerations. Surprisingly,
107 older ICs *and* newer ICs tend to be higher cost. It comes down to
108 what is currently available and being mass-produced. Older ICs fall
109 out of popularity and thus become harder to find, or move to "legacy"
110 foundries that have a higher cost per unit production.
111
112 Newer ICs tend to be higher speeds and higher capacities, meaning that
113 the yields are lower, the demands higher. Costs can be sky high on a
114 near-exponential curve based on capacity and speed compared to other
115 offerings.
116
117 Picking the right RAM interface (*and* picking the right speed grade range
118 and bus bandwidth)
119 that will ensure that the entire SoC
120 has a useful lifetime is therefore really rather important! If the
121 expected lifetime is to be for example 5 years, it would be foolish
122 to pick a DDR RAM interface that, towards the end of those 5 years,
123 the cost of the only available RAM ICs is ten times higher than it
124 was when the SoC first came out.
125
126 In short - jumping the gun somewhat on why this document has been
127 written - this means that DDR3/DDR3L/LPDDR3 is the preferred interface
128 *at the moment*, given especially that SoCs such as the iMX6 have a
129 support profile (lifetime) of 19 years, another 15 of which are
130 still to go before the iMX6 reaches EOL. Whilst DDR4/LPDDR4 would be
131 "nice to have", it's still simply not reached the point yet where
132 it's commonly available from multiple suppliers, and will not do
133 so for many years yet. It will require at least two Chinese
134 Memory Manufacturers (not just Hynix, Micron and Samsung basically)
135 before it starts to become price-competitive. A quick search
136 on taobao.com for Hynix P/N H9HCNNNBUUMLHR basically tells you
137 what you need to know: very few suppliers, all with multiple
138 "fake" listings, fluffing themselves up literally like a peacock
139 to make them appear more attractive. Compare that to searching
140 for P/N H5TC4G63CFR on taobao and the fact that there are 5 *pages*
141 of results from wildly disparate sellers, all roughly around the
142 same price of RMB 20 (around USD $3) and that tells you that it's
143 mass-produced and commonly available.
144
145 ## Board area
146
147 15 cm^2 is about the minimum in which either four x8 or x16 DDR3 RAM ICs
148 can be accommodated, including their routing, on one side of the PCB.
149 There are other arrangements however 15 cm^2 is still reasonable
150 for the majority of products with the exception of mobile phones and
151 smaller sized smartphones. 7in Tablets, SBCs, Netbooks, Media Centres:
152 all these products can be designed with a 15 cm^2 budget for RAM, and
153 meet a very reasonable target price due to not needing 8+ layers, blind
154 vias, double-sided reflow involving epoxy resin to glue underside ICs,
155 or other strategies that get really quite expensive if they are to be
156 considered for small initial production runs.
157
158 With massive production budgets to jump over many of the hurdles, there is
159 nothing to be concerned about. However if considering a production and
160 design budget below USD $50,000 and initial production runs using Shenzhen
161 factories for pre-production and prototyping, "techniques" such as
162 blind vias, 8+ layer PCBs and epoxy resin for gluing ICs onto the underside
163 of PCBs become quickly cost-prohibitive, despite the costs averaging out
164 by the time mass-production is reached.
165
166 So there is a barrier to entry to overcome, and the simplest way to
167 overcome that is to not get into the "small PCB budget" territory that
168 requires these techniques in the first place.
169
170 # RAM Design Options
171
172 This section covers various options for board layout and IC selection,
173 including custom-designing ICs.
174
175 ## Multi-Chip Modules
176
177 This is basically where the SoC and the RAM bare die are on a common
178 PCB *inside* the same IC packaging. Routing between the dies is carried
179 out on the common PCB, which is usually multi-layer.
180
181 With the down-side that it requires large up-front costs to produce, plus
182 an overhead on production costs when compared to separate ICs, the space
183 and pincount savings can be enormous: one IC taking up 1.5 cm^2 instead
184 of up to 15 cm^2 for a larger SoC plus routing plus 4 DRAM ICs, plus a
185 saving of around 75 pins for 32-bit-wide DDR RAM not being needed to be
186 brought out.
187
188 In addition, beyond a certain speed (and number of dies on-board), the
189 amount of power consumption could potentially exceed the thermal capacity
190 of smaller packages in the first place.
191
192 The short version is: for smaller DRAM sizes (32mb up to 256mb), on-board
193 RAM as a Multi-Chip Module has proven extremely successful, as evidenced
194 by the Ingenic M200 and X1000 SoCs that are used in smart watches sold in
195 China. Beyond that capacity (512mb and above) the cost of the resultant
196 multi-die chip appear less attractive than a multi-chip solution, meaning
197 that it is quite a risky investment proposition.
198
199 ## Package-on-Package RAM
200
201 The simplest way to express how much PoP RAM is not a good idea is
202 to reference the following, an analysis of a rather useful but
203 very expensive lesson:
204 <http://laforge.gnumonks.org/blog/20170306-gta04-omap3_pop_soldering/>
205
206 Package-on-Package RAM basically saves a lot of space on a PCB by stacking
207 ICs vertically. It's typically used in mobile phones where space is at
208 a premium, yet the flexibility when compared to (fixed capacity) Multi-Chip
209 Modules is desirable.
210
211 The problem comes in assembly, as the GTA04 / GTA05 team found out to their
212 cost. In the case of the TI SoC selected, it was discovered - *after* the
213 design had been finalised and pre-production prototypes were being assembled -
214 that the SoC actually *warped and buckled* under the heat of the reflow oven.
215 "Fixing" this involves extremely careful analysis and much more costly
216 equipment than is commonly available, plus trying tricks such as covering
217 the SoC and the PoP RAM in U.V. sensitive epoxy resin prior to placing it
218 into the reflow oven, as a way to make sure that the IC "stack" has a
219 reduced chance of warpage.
220
221 Normally, a PoP RAM supplier, knowing that these problems can occur, simply
222 will not sell the RAM to a manufacturer unless they have proven expertise
223 or deep pockets to solve these kinds of issues. Nokia for example was known
224 to have tried, in one case, to have failed sufficient times such that they
225 had around 10,000 to 50,000 production-grade PCBs that needed to be recovered
226 before they managed to find a solution. Once they had succeeded they went
227 back to those failed units, had the SoC and PoP RAM removed (and either
228 re-balled or, if too badly warped, simply thrown out), and re-processed
229 the PCBs with new PoP RAM and SoC on them rather than write them off entirely:
230 still a costly proposition all on its own.
231
232 In short: Package-on-Package RAM is only something that, realistically, a
233 multi-billion-dollar company can consider, when the supply volumes are
234 *guaranteed* to exceed tens of millions of units.
235
236 ## Multi-chip Module RAM Interfaces
237
238 One possibility would be to consider either custom-designing
239 a RAM IC vs using a standard (JEDEC) RAM interface, or even some kind
240 of pre-existing Bus (ATI, Wishbone, AXI). When DDR (JEDEC) standard
241 interfaces are utilised, the advantage is that off-the-shelf die pricing
242 and supply can be negotiated with any of the DRAM vendors.
243
244 However, in a fully libre IC, if that is indeed one of the goals,
245 it becomes necessary to actually implement the DRAM interface (JEDEC
246 standard DDRn). Several independent designers have considered this:
247 there even exists two published DDR3 designs that are already available
248 online, the only problem being: they are Controllers not including the
249 PHY (actual pin pads).
250
251 So to save on doing that, logically we might consider utilising a
252 pre-existing bus for which the VHDL / Verilog source code already
253 exists: ATI Bus, SRAM Bus, even ONFI, or better Wishbone or AXI.
254 The only problem is: now that you are into non-standard territory,
255 it becomes necessary to consider *designing and making your own DRAM*.
256 This is covered in the following section.
257
258 ## Custom DRAM or SRAM vs off-the-shelf dies
259
260 The distinct advantage of an off-the-shelf die that conforms to the JEDEC
261 DDR1/2/3/4 standard is: it's a known quantity, mass-produced (all the
262 advantages already described above). We might reasonably wish to consider
263 utilising SRAM instead, but SRAM is a multi-gate solution per "bit" whereas
264 a DRAM cell is basically a capacitor, taking up only one gate's worth of
265 space per bit: absolutely tiny, in other words, which is why it's used.
266
267 Not only that but considering creating your own custom DRAM, you in effect
268 become your own "single supplier", with Research and Development overheads
269 to have had to take into consideration as well.
270
271 In short: it's a huge risk with no guaranteed payoff, and not only that
272 but if the development of the alternative DRAM fails but the SoC was
273 designed exclusively without a JEDEC-standard DRAM interface on the
274 expectation that the alternative DRAM *would* succeed, the SoC is now
275 up the creek without a paddle.
276
277 In reverse-engineering terms: the rule of thumb is, you never make more
278 than one change at a time, because then you cannot tell which change
279 actually caused the error. An adaptation of this rule of thumb to apply
280 heree: there are *three* changes being made: one to use a non-standard
281 Memory interface, two to develop and eentirely new DRAM chip and three to
282 use the same non-standard Memory interface *on* that DRAM IC. In short,
283 it's too much to consider all at once.
284
285 ## DDR1..DDR4
286
287 Overall it's pointing towards using one of the standard JEDEC DDR interfaces.
288 DDR1 only runs at 133mhz and the power consumption is enormous: 1.8v and above
289 is not uncommon. DDR2 again is too slow and too power-hungry. DDR3 hits
290 the right spot in terms of "common mass production" whereas DDR4, despite
291 its speed and power consumption advantages, is migrating towards being
292 too challenging.
293
294 In an earlier section the availability of LPDDR4 RAM ICs, which would be great
295 to use if they were easily accessible, was shown to be far too low. Not only
296 that but DDR4 runs at a minimum 2400mhz DDR clock rate: 1200mhz (1.2ghz!)
297 signal paths. It's now necessary to take into consideration the length of
298 the tracks *on the actual dies* - both in the SoC and inside the DRAM - when
299 designing the tracks between the two. It's just far too risky to consider
300 tackling.
301
302 So overall this is reinforcing that DDR3/DDR3L/LPDDR3 is the right choice
303 *at this time*.
304
305 # Conclusion: DDR3/DDR3L/LPDDR3
306
307 DDR3 basically meets the requirements.
308
309 * 4x DDR3L 8-bit FBGA78 ICs @ 1066mhz meets the power budget
310 * Likewise 2x DDR3L 16-bit FBGA96 @ 1066mhz
311 * Likewise 1x LPDDR3 32-bit FBGA168 @ 1866mhz
312 * Pricing and availability is good on 8x and 16x DDR3/DDR3L ICs
313 (not so much on LPDDR3)
314 * There are multiple suppliers of DDR3 including some chinese companies
315 * 4x DDR3 8/16-bit RAM ICs easily fits into around 15 cm^2.
316
317 Risks are reduced, pricing is competitive, supply is guaranteed, future
318 supply as speeds increase is also guaranteed, power consumption is reasonable.
319 Overall everything points towards DDR3 *at the moment*. Despite the iMX6
320 still having nearly 15 years until it is EOL, meaning that Freescale / NXP
321 genuinely anticipate availability of the types (speed grades) of DDR3 RAM ICs
322 with which the iMX6 is compatible, it is *always* sensible to monitor the
323 situation continuously, and, critically, to bear in mind that, in the
324 projected lifespan planning, an SoC takes at least 18 months before it
325 hits production.
326
327 So from the moment that the SoC is planned, whatever peripherals (including
328 DRAM ICs) it is to be used with, the availability planning starts a
329 full *eighteen months* into the future. For a libre SoC where many
330 people working on it will not consider signing NDAs, it becomes even
331 more critically important to ensure that whatever ICs it requires -
332 DRAM especially - are cast-iron guaranteed to be available within the SoC's
333 projected lifespan. DDR3 it can be said to meet that and all other
334 requirements.
335