Posted: October 1, 2014

The A8 Processor

       

On to the Apple A8 chip. Apple said that it has two billion transistors (twice that of the A7) and is reportedly built on TSMC’s 20 nm process. At 89 mm2, it is 13 percent smaller than the A7 (102 mm2), while Apple claims it has an up to 25 percent faster CPU and 50 percent faster graphics. It is up to 50 times faster than the original iPhone’s CPU and up to 84 times faster at graphics; all while being 50 percent more energy efficient than the A7, which we hope is a good sign for the batteries on those big ol' displays.

What you see above is the memory part of the package-on-package (PoP); the actual A8 chip is buried underneath. We’ll have to wait until we’ve separated the two to see the A8 chip itself. The package markings do tell us something, though - for one, Apple has changed the part number convention from the usual “98” suffix for their APUs (the A4 was APL0398, the A5 was APL0498, etc.), to APL1011 this time. Another conclusion is that the DRAM in the top part is only 1 GB, in contrast to the up to 3 GB seen in other phones these days.

We now have a photograph of the bottom A8 package from the PoP, including an apparent part number different from anything on the top of the PoP. The date code is 1434, which means the chip went from the packaging company (likely in Taiwan) to Foxconn in China and on to a store shelf in Ottawa in only six weeks - not bad for just-in-time!

We won’t comment on the part number (P0XY is a derogatory slang term in the UK), but the package itself shows three rows of solder balls, up from the usual two. This is a trend that started with the A7, and is a result of the increased heat dissipated by the more powerful graphics in today’s chips. The memory in the top package has not changed, so there is no need for the extra connectivity given by the third row of solder balls.

UPDATE: We’ve added above the die photograph showing the top metal layer.

The die size is 8.5 mm x 10.5 mm = 89.25 mm2, as stated by Apple. Indications are that it is fabbed by TSMC, but we don’t have enough images yet to be 100% sure. The first cross section of the structure shows that there are ten metals in the stack (see the third thumbnail image above).

We can tell you that the contacted gate pitch is ~90 nm, which agrees with our report on the Qualcomm MDM9235, also fabbed by TSMC in their 20 nm process. (Contacted gate pitch is a measure of the process node in which a device is manufactured, and is quoted in most technical papers when a company announces a new process.)

After seeing more die markings, we should add that the P0XY “part number” referred to above appears to be a lot code, since the equivalent markings on other dies are very different; so only a coincidental row of digits, not an unfortunate device designation.  

UPDATE: We have now looked at good comparison of images with the Qualcomm part, and convinced ourselves that the A8 is indeed made at TSMC.

After seeing more die markings, we should add that the P0XY “part number” referred to above appears to be a lot code, since the equivalent markings on other dies are very different; so only a coincidental row of digits, not an unfortunate device designation.

UPDATE:  We now have a transistor-level image with some tentative conclusions as  to the functionality of the A8.

These were the result of three gurus debating this state of the art processor: our very own Dick James and Randy Torrance have debated the merits of a memory block and the size of cache and SRAM with our friend Ryan Smith, Editor-In-Chief of Anandtech. For those who hadn’t heard, Anand lal Shimpi has retired from Anandtech at the ripe old age of 32, and Ryan has taken over his seat. Congratulations to Ryan!

First, for comparison, we include a shot of the A7 from the iPhone 5s.  The first things to note are that the CPU and GPU have swapped sides, and we still have our mystery block of SRAM, in the same position. The speculation last year was that it was an L3 cache, or some sort of cache or buffer for the GPU. Since it is now on the far side of the chip,  the latter seems less likely.

There has been much comment that the die  is notably smaller than the  A7, at 89 mm2 vs. 104 mm2, or 85% of the A7 area, yet with two billion transistors vs “over a billion”.  If we take the 28 ->20-nm node size reduction, then with the same functionality the shrink of the die area should be to ~51% of the A7 size, or ~53 mm2. Clearly at 89 mm2, there is quite a bit of extra power added into the die.

If we look at the CPU, it is still a dual-core,  but  now has an area of 12.2 mm2 vs. 17.1 in the A7, or 71%; so again, more functionality added. Ryan speculates that it is a tweaked version of the A7’s Cyclone CPU. Looking at the two core layouts, it may be that each core has its own L2 cache, as opposed to the shared cache of the A7, and it is  also possible that both the L1 and L2 caches are  larger, up from 256 KB and 1 MB respectively.

Similarly we again have a 4-core GPU, but 86% of the area  (19.1 vs. 22.1 mm2), so even more tweaks must have be added. Ryan also surmised that Apple would be using the Imagination PowerVR Series6XT GX6450, but if so it takes up more area than we expected. Subjectively, there again seems to be more SRAM in the cores than in the A7 device. (One thing about going from 28 nm down to 20, SRAM cell size shrinks  too -  in our  analysis of the  20-nm Qualcomm MDM9235, the SRAM cell size is ~0.08 µm2; so adding memory is cheap in terms of area.)

The floorplan now shows  our joint conclusions of the GPU layout, with texture units shared between the two vertical pairs of cores, and shared  general logic between the horizontal pairs.

Now we get to the mystery SRAM block - you can see from the image, we have decided that it is indeed an L3 type of cache memory. In terms of area, it’s about 80% of the A7 block, so we’re tempted  to say there is more than the 4MB in the A7. However, the SRAM cell size has not shrunk 50%; we’ve gone from ~0.12 to ~0.08 µm2,  so at its best  we would get a 33% size reduction. Add in some more complex circuitry for memory read/write (it’s not easy to keep stable data with a cell size that small), and we likely still have 4MB of cache.

Check here for Ryan's take on our look at the A8.