[FoRK] The Really Good Times are Over

Eugen Leitl eugen at leitl.org
Wed Oct 23 01:52:11 PDT 2013


Next Gen Graphics and Process Migration: 20 nm and Beyond

Author: Josh Walrath

Date: October 22, 2013

Subject: Editorial


Tagged: UMC, TSMC, tri-gate, rendition, PD-SOI, nvidia, Intel,

The Really Good Times are Over

We really do not realize how good we had it.  Sure, we could apply that to
budget surpluses and the time before the rise of global terrorism, but in
this case I am talking about the predictable advancement of graphics due to
both design expertise and improvements in process technology.  Moore’s law
has been exceptionally kind to graphics.  We can look back and when we plot
the course of these graphics companies, they have actually outstripped Moore
in terms of transistor density from generation to generation.  Most of this
is due to better tools and the expertise gained in what is still a fairly new
endeavor as compared to CPUs (the first true 3D accelerators were released in
the 1993/94 timeframe).

The complexity of a modern 3D chip is truly mind-boggling.  To get a good
idea of where we came from, we must look back at the first generations of
products that we could actually purchase.  The original 3Dfx Voodoo Graphics
was comprised of a raster chip and a texture chip, each contained
approximately 1 million transistors (give or take) and were made on a then
available .5 micron process (we shall call it 500 nm from here on out to give
a sense of perspective with modern process technology).  The chips were
clocked between 47 and 50 MHz (though often could be clocked up to 57 MHz by
going into the init file and putting in “SET SST_GRXCLK=57”… btw, SST stood
for Sellers/Smith/Tarolli, the founders of 3Dfx).  This revolutionary
graphics card at the time could push out 47 to 50 megapixels and had 4 MB of
VRAM and was released in the beginning of 1996.

My first 3D graphics card was the Orchid Righteous 3D.  Voodoo Graphics was
really the first successful consumer 3D graphics card.  Yes, there were
others before it, but Voodoo Graphics had the largest impact of them all.

In 1998 3Dfx released the Voodoo 2, and it was a significant jump in
complexity from the original.  These chips were fabricated on a 350 nm
process.  There were three chips to each card, one of which was the raster
chip and the other two were texture chips.  At the top end of the product
stack was the 12 MB cards.  The raster chip had 4 MB of VRAM available to it
while each texture chip had 4 MB of VRAM for texture storage.  Not only did
this product double performance from the Voodoo Graphics, it was able to run
in single card configurations at 800x600 (as compared to the max 640x480 of
the Voodoo Graphics).  This is the same time as when NVIDIA started to become
a very aggressive competitor with the Riva TnT and ATI was about to ship the
Rage 128.

Process technology at this time improved in leaps and bounds.  Intel was
always at or near the lead with others like IBM and Motorola keeping pace.
TSMC was the first Pure-Play foundry selling line space to 3rd parties and
others such as Chartered and UMC were competitive across all of their lines.
TSMC has traditionally been the go-to foundry for the graphics industry, but
around this time UMC was a close second.  Within one and a half years from
the introduction of the Voodoo 2 and TnT class of graphics adapters, TSMC was
offering 250 nm lines for willing customers.  NVIDIA was one of the first
with the TnT 2 products, followed closely by 3dfx and the Voodoo 3.  ATI was
a little bit behind with the Rage 128 Pro, but they were making progress in
keeping up.

Right after this we were introduced to the half-step for process nodes.  TSMC
released their 220 nm process for production and NVIDIA jumped on board with
the original GeForce 256.  We did not see the big jump in power and die size
benefits that a full process node can give, but it did provide a quick
transition for designers going to the next advanced node.  Moving along we
see the introduction of the 180 nm node and the GeForce 2 class of products.
The GeForce 2 GTS was a 25 million transistor chip that was running at 200
MHz.  Go back to the 2 million transistor Voodoo Graphics and we see that the
chip design of the GeForce 2 GTS is 12.5x more complex running at four times
the speed.  Between the Voodoo Graphics and GeForce 2 GTS we see only a span
of four years between these developments.

The NVIDIA Riva TnT was the first serious competitor for 3Dfx's lineup of
cards, including the then new Voodoo 2.

The pace did not slow down there.  Next up was the 150 nm half node from TSMC
and the GeForce 3 series.  This chip was a monster for the time.  It was one
of the first consumer level products that had a transistor count of around 57
million.  The GeForce 4, which was released a year after the GeForce 3 and
still using the 150 nm process bumped that count up to around 67 million.
Then came the monster from ATI.  The R300, which powered the Radeon 9700 Pro,
was an astonishing 107 million transistors on the same 150 nm process.  In
the two years between 2000 and 2002 we see another quadrupling of transistor
counts between two process nodes (and a half node at that) and another 100 to
150 MHz of speed for a complex GPU.

Around 2004 things started to slow down a bit, but that is a relative term as
compared to the first eight years in 3D graphics.  I had written an article
at my old site that covered what I had expected to be a problem in the years
following.  “Slowing Down the Process Migration” discussed the inevitable
slowing of process node transitions due to issues in materials, design
strategies, and plain old physics.  Little did I know some of the major
issues that plagued the 130 nm jump (migrating voids, design rule changes
midstream, etc.) would be solved and we again returned to a very regular
cadence of process improvements.  130 nm lead to 110, 90, 80, 65, 55, 45, 40,
32, and now 28 nm.  Graphics products did not inhabit every node, but they
hit all of the major ones (45 and 32 nm were absent from most graphics

So where are we at now?  In 2003 the top end product was the Radeon 9800 XT
running at 412 MHz and was comprised of 117 million transistors using TSMC’s
highly optimized 150 nm process.  Today we are looking at the GTX TITAN based
on the NVIDIA GK110 processor that weighs in at 7 billion transistors and
around 850 MHz.  This represents twice the raw clockspeed and an astonishing
70 times more complex in transistor design in the span of ten years.  It is
absolutely no wonder that we are spoiled by the constant stream of new
products that advance the state of the art on a yearly basis with a major
process node improvement every 18 months or so.

With this highly aggressive pace from year to year, why are we in graphics
name only refresh-land right now?  I am starting to see a lot of commenters
discussing their displeasure at both NVIDIA and AMD for their lack of a true,
next-generation GPU.  The GK104 that originally powered the GTX 680 has
morphed into a variety of products including the GTX 770 and GTX 760.  The
GTX TITAN based on GK110 was released last year and it has been repurposed
for the GTX 780.  AMD refreshed their lineups with last year’s Tahiti and
Pitcairn chips, and the top end Hawaii chip (R9 290X) only reaches the
complexity of last year’s GK110.  These parts are all based on TSMC’s 28 nm
process.  Where exactly are the new chips and why aren’t we at 20 nm yet?

Hitting the Wall Early and Often!

We as consumers have taken process advancements for granted.  Moore’s Law
states that transistor densities will double every 18 months, and that has
held true for a long time.  Companies like NVIDIA set an aggressive pace in
terms of new products and refreshes that would often span around 14 to 16
months from start to finish.  So here we are some 22 months from the
introduction of the HD 7970 and we see this same part refreshed as the R9
280X.  During that time we saw some clock speed improvements as TSMC’s 28 nm
process matured, but the basic performance of the chip is essentially
unchanged.  Some 14 months after the release of the first GK104 parts from
NVIDIA, they too refreshed those exact same chips with the GTX 700 series.
Again, we saw a small bump in performance due to higher clockspeeds, but
there is no true next-generation part waiting in the wings.

What exactly has happened that has slowed the pace of advancement for
graphics?  There are two major factors seemingly at play; the rise of mobile
computing and the chips that are powering this revolution, and the extreme
slowdown of process migration as compared to historical trends.

I am still not entirely sure what voodoo ATI used at the time to get the
basic R300 design to run on TSMC's 150 nm process as effectively as they did.
The 9800 XT at 415 MHz is just sorta crazy, and it didn't break the bank when
it came to TDPs.  The original R300 was a singular moment in GPU history.

Mobile computing is perhaps the most tenuous reason, but it does make some
sense in a variety of factors.  Both AMD and NVIDIA have mobile graphics
groups which take away design resources from their larger projects.  While
the modern Kepler and GCN architectures are able to scale from fairly low to
really high in terms of TDP, they are not entirely effective when talking
about the half watt space that are primarily where smartphones sit.  NVIDIA
has a totally different architecture for graphics in Tegra as compared to the
desktop.  AMD does not have a graphics architecture that will currently exist
in that ultra-low TDP range, instead they utilize GCN for products that are 4
watts and up.  NVIDIA is planning on opening up Kepler to those areas, but
they are not there yet.

Mobile computing is also a growth area for these companies as compared to
desktop and laptop graphics.  R&D resources now have to be spread out to the
different groups and they have to have competitive products, otherwise the
company will not be able to cash in on those growth numbers that we have been
seeing for the past several years.  After mobile chips have been developed,
then we fractionalize off software and hardware support so these products can
be integrated effectively into a 3rd party product.  This is all money
shifted away from desktop graphics.  Remember, desktop graphics is actually a
shrinking market due to the effective integration of graphics not just in the
mobile space, but also with higher powered CPUs/APUs from Intel and AMD.

Finally with mobile computing, we are seeing a lot more pressure on advanced
process lines in terms of wafer buys.  These ARM based chips are thriving at
the 32 nm and 28 nm nodes.  The vast majority of users are quite pleased by
the performance of these products across different workloads, and they have
excellent power characteristics.  These are relatively small chips, so quite
a few of them can be fit onto a wafer.  The problem here is the economics.
Margins are thin on these chips, and so the companies making the orders are
probably much more aggressive in pursuing contracts, and leveraging different
pure-play foundries against each other (TSMC, UMC, and GLOBALFOUNDRIES).
Samsung then throws another wrench into the mix by not just fabricating their
own parts (Exynos), but also selling fab space to their competition in the
form of Apple.  If these companies can in fact effectively negotiate lower
priced wafers with promises of filling up the lines with orders, then
companies such as TSMC will make less money per wafer as compared to more
complex products like GPUs.  Less money is less R&D for advanced process
features, and this behavior also maximizes the already spent R&D investment
on the current process.  The end result here is less money being allocated
towards advanced process development, so these advanced nodes will take
longer to develop.

The accountants at the foundries have some very complex equations to maximize
manufacturing and minimize expenses.  The risk of falling behind is always
there, but these foundries are used to being a process node behind the
industry leader (Intel) and still being able to pull good profits.  These
foundries also get a significant cost break by adopting technology well after
Intel has done the lion’s share of work (think optics, lithography, wafer
handling, deposition, etc.) and monetary investment.  Their motivation is to
stay close, but not risk the bleeding edge.  This is the opposite of what AMD
did when they owned their own fabs, as their primary product competed
directly with Intel.  Now the GLOBALFOUNDRIES is on its own, it has slowed
down its pace of next generation process technology introduction, much to
AMD’s chagrin.

Mobile computing has been a steady stream of income for the foundries as more
and more products require advanced chips to power them.  Again, maximizing
the investment in a current process line makes the company more money and
leverages the expenses much more effectively than trying to jump to the next
node as soon as possible.

This leads us into the slowdown of process technology that we are seeing.
While previous process nodes have had their issues (130 nm had void
migration, the jump to copper interconnects was not without problems, etc.)
it seems like the current 28 nm HKMG node was perhaps the last “easy” jump
that the foundry industry will see.  This is not to say that 28 nm HKMG was
easy, but the obstacles in the way towards 22/20 nm are pretty tremendous.
Intel was able to get to 22 nm over a year and a half ago with very good
results.  This came about because of the billions that Intel invested in
their fabrication technology.  They are the first to have implemented
Tri-Gate in mass produced parts.  This was not an inexpensive endeavor in
terms of money and man-hours.  Now, the reason why Intel went with the
Tri-Gate technology was not about beating its chest and proclaiming that they
had the most advanced process available; the reason was that they had no real
choice in the matter if they were going to produce high performance CPUs that
would scale power effectively with clock speed.

Intel spent billions to get 22 nm Tri-Gate up and running.  They are reaping
the benefits of this technology each and every quarter that the rest of the
industry lags behind.

22/20 nm processes can pack the transistors in.  Such a process utilizing
planar transistors will have some issues right off the bat.  This is very
general, but essentially the power curve increases very dramatically with
clockspeed.  For example, if we were to compare transistor performance from
28 nm HKMG to a 20 nm HKMG product, the 20 nm might in fact be less power
efficient per clock per transistor.  So while the designer can certainly pack
more transistors into the same area, there could be some very negative
effects from implementing that into a design.  For example, if a designer
wants to create a chip with the same functionality as the old, but increase
the number of die per wafer, then they can do that with the smaller process.
This may not be performance optimized though.  If the designer then specifies
that the chips have to run as fast as the older, larger versions, then they
run a pretty hefty risk of the chip pulling just as much power (if not more)
and producing more heat per mm squared than the previous model.

Intel got around this particular issue by utilizing Tri-Gates.  This
technology allowed the scaling of performance and power that we are
accustomed to with process shrinks.  This technology has worked out very well
for Intel, but it is not perfect.  As we have seen with Ivy Bridge and
Haswell, these products do not scale in speed as well as the older, larger 32
nm Sandy Bridge processors.  Both of the 22 nm architectures start pulling in
more power than the previous generation when clockspeeds go past 4.0 GHz.
Having said that, the Intel 22 nm Tri-Gate process is exceptionally power
efficient at lower clockspeeds.  The slower the transistors switch, the more
efficient they are.  These characteristics are very favorable to Intel when
approaching the mobile sector.  This is certainly an area that Intel hopes to
clean up in.  This is the area that is finally scaring all the other 3rd
party SOC designers (Qualcomm, Samsung, NVIDIA, etc.) and potentially putting
more pressure on the pure-play foundries to get it together.

20 nm and Below

Getting to 20 nm for these foundries is a challenge.  The first area is that
of lithography.  The industry is not at the point where EUV (Extreme UV) is
effective or affordable- or even workable for that matter.  To achieve the
geometries required the foundries have to use immersion litho,
multiple-patterning, and other optical techniques to effectively complete the
litho stage.  This is not even delving into the materials needed.  Currently
plans are that these will be bulk silicon wafers, but they will be using
second generation HKMG, third generation SiGe strain technologies, and a
gate-last approach.  The current 28 nm HKMG process from TSMC employs a less
complex gate first approach that is more cost effective.  At 20 nm, there
really is no choice but to force 3rd parties to adopt gate last to get the
best results.

Some two years ago GLOBALFOUNDRIES showed off their test 28 nm SRAM wafers.
Nobody expected them to be as delayed as they ended up being.

TSMC and others are busy developing their own technology akin to Tri-Gates.
These are called 3D Fin-FETs.  The basic design and physics behind these
structures are essentially the same, but Intel trademarked theirs first.  The
problem here is that we are still at least two years away from an effective
implementation of FinFETs on any node from any pure-play foundry.  So the GPU
guys are looking at a new process node that will effectively shrink the
transistors, but may not have the electrical characteristics they were hoping
for.  TSMC is not planning on opening up their 20 nm HKMG planar based lines
until Q1/Q2 2014 with product being delivered in a Q3 timeframe.  TSMC is
ahead of the bunch so far with actually implementing a 20 nm line.

GLOBALFOUNDRIES is also developing advanced process nodes, but so far things
have been disappointing for the company.  When they first came on the scene
some years back there was a lot of hope that they could move past TSMC in
terms of implementing advanced process technologies since they had the
impetus of being ahead in the days when AMD owned the fabs.  That impetus
soon went away once the price of implementing these advanced technologies did
not mesh with the economics of being a pure-play foundry.  While TSMC had
opened their 28 nm HKMG nearly two years ago, it was less than a year ago
that GF opened their 28 nm line to customers.  So far the designs from GF’s
28 nm have essentially been smaller SOCs from players like MediaTek and
Rockchip.  AMD’s Kaveri APU has been delayed from what looks to be
fabrication issues rather than design issues.  We do not expect to see Kaveri
in mass quantities until Q1 2014.  GF has been behind the times when it comes
to process technology, but some interesting things have come up that could
change the landscape.

SOI Strikes Again

Who all thought that SOI was dead after AMD decided to stop using it after
moving away from GF’s 32 nm PD-SOI process?  Well, more than a few, but the
truth is SOI is a very handy technology that is used in many other products,
one of which is high speed RF switching applications.  AMD has been utilizing
PD-SOI for many generations of parts.  Partially-Depleted is an older and
well understood substrate that has done very well for AMD.  Unfortunately for
AMD, 32 nm was really the last gasp for PD-SOI.  Going below that size, the
electrical characteristics of PD-SOI are not that much better than bulk
silicon.  Though there is a positive difference, it is not enough to justify
the 10% to 15% increase in wafer costs and slightly more complex
manufacturing process. 

The future site of FD-SOI integration?  Fab 8 is a sprawling complex that
could be the new epicenter of advanced materials research utilzing SOI.

All is not lost for SOI.  FD-SOI (fully depleted) looks to be a very
interesting and cost effective strategy for going below 28 nm.  FD-SOI does
cost more per wafer, but most of the processing equipment is the same as for
bulk silicon.  There does need to be special handling and usage of materials,
but it is not nearly as complex as implementing FinFETs.  FD-SOI will utilize
planar transistors, making manufacturing much more simple than that required
for FinFETS.

The problem with FD-SOI manufacturing is that so far only one company has
done it.  ST-Micro owns and operates a small Fab in France that can produce,
at max, around 500 wafer starts per week.  Most of the mega-Fabs around the
world can produce around 5000 to 12,000 wafer starts per week.  This Fab
obviously cannot provide enough manufacturing space except for small clients
with minimal needs.  ST-Micro has licensed out the technology to
GLOBALFOUNDRIES, but the next issue is that so far it is aimed at only two
nodes; 28 nm and the distant 14 nm.  We do not know if GF has plans for 20
nm, but looking at the global marketplace and the potential demand it seems
that 20 nm FD-SOI would be a very good target to aim at.

FD-SOI seems like it answers most of the issues that crop up with the 22/20
nm node.  It does not require massive design rule changes, it can re-use a
lot of bulk silicon manufacturing technology, and it runs perfectly fine with
planar transistors at 22/20 nm.  In a gate-last configuration, FD-SOI with
planar transistors actually looks like it outperforms and scales
significantly better than Intel’s 22 nm Tri-Gate process.  A theoretical 20
nm FD-SOI process would have smaller features and be able to scale with
clockspeed without as steep a power curve than what we see currently with
Intel’s 22 nm Tri-Gate.  In terms of lower power operation, it appears as
though FD-SOI is no better or worse than what Intel offers.

The machinery is prepped and ready for GLOBALFOUNDRIES, but until we see how
they do with mass production of Kaveri, we are unsure where they really sit.

Sounds perfect, right?  The problem is of course money and man-hours.  FD-SOI
wafers are not being mass produced at this time, though production can be
ramped up fairly quickly.  GF has yet to implement 28 nm FD-SOI at any of
their fabs and the timeline for manufacturing products on this process has
yet to be determined (or at least released to the general public).  Also,
there is no public roadmap for a 20 nm FD-SOI process to be offered from

GF has only now started mass production of the latest generation of AMD APUs
based on 28 nm.  The previous Kabini APUs were all produced by TSMC on their
28 nm process.  GF’s work on 20 nm has been confined to their test labs and
very little is known about its characteristics.  Perhaps they are going for
the jugular and are preparing a 20 nm FD-SOI, but so far their track record
for hitting their process milestones has been lacking.  GF has increased
their marketshare and are more competitive with TSMC, but so far they have
not been able to compete adequately with what TSMC offers (plus their revenue
is 1/5 that of TSMC).  Working with ST-Micro to implement 28 nm FD-SOI is a
benefit for the company as there are customers who are interested in that
particular node.  The performance and power results on ARM processors on 28
nm FD-SOI are outstanding, especially considering the relatively small cost
increase to utilize FD-SOI wafers.

What is the Point of this Editorial?

Many people were waiting on a true, next generation GPU to be released at
this time.  While the Hawaii GPU from AMD is a new (and potentially exciting)
part, it is not the big jump that many were hoping for.  It looks to compete
with the GTX TITAN, but it will not leapfrog that part.  It will probably end
up faster, but by a couple of percentage points.  It will not be the big jump
we have seen in the past such as going from a GTX 580 or HD 6970 to a GTX 680
or HD 7970.

Until 20 nm HKMG becomes available for production, we are in for a wait.
TSMC expects to be able to provide mass quantities of these parts by Q3 2014,
but that is not entirely set in stone.  My gut feeling here is that TSMC will
be pretty close to that timeline and we would expect to see 20 nm GPUs
hitting the market in around a year from now.  The problem that we are
potentially looking at could very well be heat and power constraints holding
these designs back.  I do not doubt that it will be a nice jump in terms of
performance from these next gen parts, but the use of 20 nm bulk will limit
the potential of these products from a power consumption standpoint.

The NVIDIA GK110, which powers the GTX Titan and GTX 780, is a huge chip
which packs in over 7 billion transistors.  Expect to see this (and possibly
a refreshed version) be the top end chip for a while.

If GLOBALFOUNDRIES has the ability to economically research, develop, and
produce parts on 20 nm FD-SOI, they could be hitting one out of the park.
The industry is clamoring for a product that can match the power
characteristics of Intel’s 22 nm process.  Intel’s Baytrail products are
causing much concern for the ARM folks, though it will still be a while
before Intel can ingratiate itself into many of the major handheld
manufacturers who have longstanding partnerships with companies such as
Qualcomm and Samsung.  3D FinFETs from TSMC are still at least 2 years away
on 20 nm, not to mention sub-20 nm lines like 16 nm and 14 nm products that
have been described by pure-play foundries.

Intel is also very close to production of 14 nm parts towards the end of this
year.  The 14 nm process is again Tri-Gate based with bulk silicon wafers.
Intel claims that it can adequately control power and clockspeed, but I find
it telling that the first products to be introduced on 14 nm are BGA only
based Broadwell parts.  On the desktop there will be a Haswell refresh at 22
nm.  This indicates that 14 nm will again be a nice step up in transistor
density and low speed power consumption, but for desktop and workstation
applications it might not be entirely adequate.  Beyond 14 nm Intel is in
fact looking at FD-SOI very carefully.  In the end, materials are king when
it comes to process technology.  We have also just learned that Intel is
delaying the Broadwell introduction for at least a quarter due to
unacceptable defect levels on their 14 nm process with this particular
product.  Even with billions in R&D and some of the most talented engineers
in the industry, Intel still faces many problems with their introduction of
advanced process nodes.

For the pure-play foundries they will have to rely on FinFET technology to go
below 20 nm.  We will see a good mix of bulk and FD-SOI products, though we
have no idea who else ST-Micro will license FD-SOI to.  The combination of
FinFET/FD-SOI holds a lot of promise, but we are still at least three years
away from such an implementation.

It was all downhill for process technology after they allowed Allyn into a
Fab.  He ruins everything.

The long and short of it is that we can expect longer time intervals between
releases of next-generation GPU architectures as they are being constrained
by the very latest process technology available.  20 nm bulk will be one year
from now, 20 nm FD-SOI is at least 1.5 years away, and any process node below
that appropriate for GPUs will be another 3 years.  AMD and NVIDIA will have
to do a lot of work to implement next generation features without breaking
transistor budgets.  They will have to do more with less, essentially.
Either that or we will just have to deal with a much slower introduction of
next generation parts.  Marketing and product segmentation will rear their
ugly heads, and we will see a very slow reduction in prices from when a
product is introduced.  We have been spoiled for the past 18 years, but it
seems like the good times are over and a whole lot of work is ahead of both
designers and foundries.

We still have many years ahead of us for product advancement, and that will
continue until we start seeing the 7 nm to 5 nm process nodes.  After that,
we are in for some rough times.  Quantum physics will start to derail silicon
based chips and we will have to move to more exotic materials to keep pace.
This is all assuming that EUV will actually work as intended.  If that does
not happen, then we will have to look at other potential lithography measures
such as x-ray.  There are many, many challenges ahead of the process
technology people, and until some of these basic problems are solved we will
likely never again see the rapid march of technology that we have experienced
since the birth of the silicon transistor some 50 years ago.

More information about the FoRK mailing list