Scott Wasson Q&A: What is “GPU IPC?” & Pipeline Stage Discussion
Friday, 21 April 2017Our third and final interview featuring Scott Wasson, current AMD RTG team member and former EIC of Tech Report, has just gone live with information on GPU architecture. This video focuses more on a handful of reader and viewer questions, pooled largely from our Patreon backer discord, with the big item being “GPU IPC.” Patreon backer “Streetguru” submitted the question, asking why a ~1300~1400MHz RX 480 could perform comparably to an ~1800MHz GTX 1060 card. It’s a good question – it’s easy to say “architecture,” but to learn more about the why aspect, we turned to Wasson.
The main event starts at 1:04, with some follow-up questions scattered throughout Wasson’s explanation. We talk about pipeline stage length and its impact on performance, wider versus narrower machines with frequencies that match, and voltage “spent” on each stage.
We’ll leave this content piece primarily to video, as Wasson does a good job to convey the information quickly.
Between its visit to the White House and Intel’s annual Investor Day, we’ve collected a fair bit of news regarding Intel’s future.
Beginning with the former, Intel CEO Brian Krzanich elected to use the White House Oval Office as the backdrop for announcing Intel’s plans to bring Fab 42 online, with the intention of preparing the Fab for 7nm production. Based in Chandler, Arizona, Fab 42 was originally built between 2011 and 2013, but Intel shelved plans to finalize the fab in 2014. The rebirth of the Arizona-based factory will expectably facilitate up to 10,000 jobs and completion is projected in 3-4 years. Additionally, Intel is prepared to invest as much as $7 billion to up-fit the fab for their 7nm manufacturing process, although little is known about said process.
AMD’s Vega GPU architecture has received cursory details pertaining to high-bandwidth caching, an iterative step to CUs (NCUs), and a unified-but-not-unified memory configuration.
Going into this, note that we’re still not 100% briefed on Vega. We’ve worked with AMD to try and better understand the architecture, but the details aren’t fully organized for press just yet; we’re also not privy to product details at this time, which would be those more closely associated with shader counts, memory capacity, and individual SKUs. Instead, we have some high-level architecture discussion. It’s enough for a start.
Taiwan Semiconductor Manufacturing Co. (TSMC) has set sights on building a new $15.7 billion facility geared towards the 5 and 3 nanometer chip processes, eyes set for future process nodes. TSMC is the world’s biggest chip maker by revenue, accounting for 55% of the market share. TSMC’s deep-pocketed clients include Qualcomm, nVidia, and Apple, whose iPhone 7 launch was especially pivotal in the record quarter to quarter profits TSMC has been reporting, as TSMC produces the A10 processor for the iPhone 7.
Taiwan Semiconductor houses its base of operations in Northern Taiwan, where several of their fabs are located. This is in addition to leading-edge fabs in Southern Taiwan and Central Taiwan, not to mention manufacturing bases in China.
Shader Intrinsic Functions to Bypass Abstraction Layers (w/ Raja Koduri)
Monday, 17 October 2016Abstraction layers that sit between the game code and hardware create transactional overhead that worsens software performance on CPUs and GPUs. This has been a major discussion point as DirectX 12 and Vulkan have rolled-out to the market, particularly with DOOM's successful implementation. Long-standing API incumbent Dx 11 sits unmoving between the game engine and the hardware, preventing developers from leveraging specific system resources to efficiently execute game functions or rendering.
Contrary to this, it is possible, for example, to optimize tessellation performance by making explicit changes in how its execution is handled on Pascal, Polaris, Maxwell, or Hawaii architectures. A developer could accelerate performance by directly commanding the GPU to execute code on a reserved set of compute units, or could leverage asynchronous shaders to process render tasks without getting “stuck” behind other instructions in the pipeline. This can't be done with higher level APIs like Dx 11, but DirectX 12 and Vulkan both allow this lower-level hardware access; you may have seen this referred to as “direct to metal,” or “programming to the metal.” These phrases reference that explicit hardware access, and have historically been used to describe what Xbox and Playstation consoles enable for developers. It wasn't until recently that this level of support came to PC.
In our recent return trip to California (see also: Corsair validation lab tour), we visited AMD's offices to discuss shader intrinsic functions and performance acceleration on GPUs by leveraging low-level APIs.
Ask GN 28: HBM on CPUs, GPU Boost 3.0 Curiosities, & More Test Methods
This episode of Ask GN (#28) addresses the concept of HBM in non-GPU applications, primarily concerning its imminent deployment on CPUs. We also explore GPU Boost 3.0 and its variance within testing when working on the new GTX 1080 cards. The question of Boost's functionality arose as a response to our EVGA GTX 1080 FTW Hybrid vs. MSI Sea Hawk 1080 coverage, and asked why one 1080 was clock-dropping differently from another. We talk about that in this episode.
Discussion begins with proof that the Cullinan finally exists and has been sent to us – because it was impossible to find, after Computex – and carries into Knights Landing (Intel) coverage for MCDRAM, or “CPU HBM.” Testing methods are slotted in between, for an explanation on why some hardware choices are made when building a test environment.
AMD AM4 Chipset Specs: B350, A320, XBA300 & A12-9800 APU, X4 950
The AMD Gen 7 APUs and AM4 platform have officially begun shipment in some OEM systems this weekend, primarily through OEMs at physical retail locations. AMD's launch includes entry-level and mainstream AM4 chipsets, promising the high-end Zen chipset (990FX equivalent) at a later date. AM4 platform shipment begins with the B350, A320, and X/B/A300 chipsets in accompaniment with the A12-9800 and down.
Let's run through the new Gen7 APU finalized specs first, then talk AM4 chipset specs. Note that the new AM4 motherboards are making major moves to unify the FM and AM platforms under AMD's banner, so Zen's FX line equivalent and the Gen7 APUs will both function on the same motherboard. The below table (following the embedded video) provides the specs for the A12-9800, X4 950, and other relevant chips:
This week following IDF has posted several news items for general computing technology and for product announcements. As one might expect, Intel unveiled more Kaby Lake information at its self-titled "Intel Developer Forum," and OCaholic posted a SKU listing for the new Kaby Lake CPUs up to the 7700K. Our news round-up video discusses the limited specifications of the i5-7600K, i7-7700K, lower TDP chips, and Intel's plans for launch.
We also look to the world of peripherals for the Logitech G Pro mouse, equipped with the PMW3366 sensor, and to the world of cases for X2's new "Empire" enclosure.
More in the video or script below, if you prefer:
This week's Ask GN episode answers viewer questions about FinFET vs. Planar, the impact of cooling on power consumption, CPU load for 120Hz / 144Hz displays, liquid cooler testing, and a few extras. We spend most the time talking liquid coolers and cooler testing – a fitting topic, having done multiple “Hybrid” video card builds lately.
The full list of questions with their timestamps can be found below the video. Thanks to our viewers for the questions and, as always, post more in the video comments on YouTube for inclusion in next week's episode.
While Intel's Developer Forum is underway in San Francisco, not far from AMD in Sunnyvale, the x64 creators held a press conference to demonstrate Zen CPU performance. Based strictly on the presentation, AMD shows a 40% IPC (Instructions Per Clock) over Vishera. The demonstration used a 16T processor, the “Summit Ridge” chip that's been discussed a few times, which runs 8 cores with simultaneous multi-threading (SMT) for 16 total threads. For the non-gaming market, CPU codename “Naples” was also present, a 32C/64T Zen server processor in a dual-CPU Windows server.
AMD detailed more of the Zen architecture in an official capacity, commenting on new caching routines and branch prediction, accompanied by the SMT changes that shift AMD away from its modular Bulldozer architecture. AMD made mention of “fanless 2-in-1s” in addition to high-performance CPUs and embedded systems.
We moderate comments on a ~24~48 hour cycle. There will be some delay after submitting a comment.