Canonical View of the Render Pipeline & Graphics Optimization
This part of the interview was almost entirely for my own knowledge. We review a lot of hardware – especially graphics hardware – and understanding the underlying software empower our ability to competently test components. Seeing an opportunity to learn (and, ultimately, produce more content), I asked for a canonical view of the render pipeline as it pertains to Star Citizen and CryEngine.
“Because it is threaded, essentially stuff gets pushed onto the render thread. So what happens [in] the cumulation of the main thread, render gets called, and render basically – on the main side – starts pushing the objects. In CryEngine they're, called 'render objects.' They get pushed off to the render queue. It gets pushed on with the information about the object, including its location in 3D space, because it's got to be threaded so you can't [depend] on the stuff that's sitting in the main thread. So [the engine] pumps all this stuff onto the render thread, then the main loop goes about doing its stuff again, while the render thread carries on doing its stuff.
“The rendering and the main loop are happening concurrently all the time. Just toward the end of the main loop, the work the main loop has done gets pushed to the render thread and the render thread goes about the work of transforming [(ed. note, read: scaling, resizing, modifying visual output)].”
CryEngine has historically had a few threads – render, game logic, game physics – but recent updates have refreshed the engine to scale with greater ability to utilize all CPU threads available. The Star Citizen team has worked with former (and current) CryEngine developers to increase high-end hardware support, yielding greater software-hardware efficiency to improve overall performance across various configurations.
“Then, on the 3D engine side, it sort of does the various things it needs to do, like passes to depth buffer, the shadow maps, all the stencils, and blasts it out. The CryEngine render side itself is a pretty tricky, complicated beast, so that's not one that I would be personally excited about refactoring (laughs). We've got a lot of people here who actually wrote it, so they would all be [doing the refactoring] and they know it better than me. I would say that it was written back in the day when it was less modularized and less object-oriented or -based, so there's a lot of interdependent codes or checks on flags that you probably wouldn't necessarily do in today's world. Those are part of the [refactoring].”
Roberts then goes on to make an analogy to home improvement, relating the team's CryEngine rebuild to the purchase of a new, otherwise good house with “plumbing problems.”
At the GPU and API level, we know that geometry and vertices are drawn first, transformed into screen space, after which point shaders are applied to the scene. This includes pixel shading, texturing, rasterization, and interpolation. Before all of that, though, the GPU works alongside the game engine to determine which assets are unnecessary for the current scene, then culls-out those assets. An example would be geometry which falls outside of the camera, perhaps due to obfuscation by some object drawn in front of it (think: A car in front of a building). In this case, the unnecessary geometric data will be culled (removed) from the pipeline, reducing workload during the shader and rasterization passes. Additional culling is performed throughout the pipeline as various bits of data are deemed unnecessary by the GPU. If you can't see the data, chances are that it was not visually rendered in the frame (insert “tree in a forest” adage as applicable).
(Above Source: Carnegie Melon University)
The memory buffers are accessed concurrently through all of this processing, fetching and swapping buffered texture data, vertex data, and packaging the output image for dispatch to the display. Some games offer the option to “pre-buffer” a set number of frames. Anyone who has played with this option – Far Cry and Watch Dogs both offer it – will be familiar with its impact on performance. Pre-buffering frames demands more of the GPU, multiplying its workload and occupying greater space in the buffers as images are compiled and prepared for future use. If you pre-buffer two frames, that means your GPU is packaging and shipping two frames down the pipe for your immediate use, which aids in fluidity at the cost of performance.
We're planning a future article and video to discuss this in greater depth.
Roberts, getting back to the interview, talked about his team's composition as it pertains to CryEngine and its ongoing refactoring:
“We have quite a few of the original core engine team. We've got, I think, three of the original Far Cry team and quite a few Crysis 1-onwards from the engine team. We've got people who helped build the engine, design and architect it, so they know it inside-out. That's very beneficial for us because, usually, if you're a licensee you don't have that knowledge, which is useful for us because we're trying to do something different than you would normally use CryEngine for. I mean, let's be honest, what we're doing – it doesn't matter if it's Unreal, Unity, or CryEngine, a lot of the refactoring we'd have to do [anyway].”
I briefly interjected, commenting on the team's conversion to a 64-bit platform.
“64-bits are big, also just kind of approaching spatialization and data-passing differently because of the vast scale and scope, and the fact that we have high density [areas] of information but also vast areas of nothing. More useful spatialization, like octrees and stuff, aren't typically useful for what we do.”
Octrees, Data Structure, Zoning, & Network Optimization
We've heard Roberts mention octrees and data structure a few times in past interviews, but never seized the opportunity to ask the developer what those are and what they do. This time, I asked, rather plainly, “you've brought that up with me before – octrees – what are those?”
Roberts replied in depth:
(Above Source: Wikipedia)
“It's like sub-dividing. I take a cube, then I can go inside this cube and say you imagine sub-dividing into four more cubes. Then when you go into one of those cubes, you can subdivide it into four more cubes. It's basically a way to pass-out areas of data to figure out whether you're occluded or not occluded, or whether you're visible in the camera, fairly efficiently [and] without having a flat list. It's one thing to have a flat list of objects and you could just check against the view frustum whether they're in the camera or not, but if you have 10,000 objects in the scene, you check against 10,000. Now, doing something like this, you could basically take the view frustum and essentially figure out what sort of containers would be visible [against an octree], then you deal with the objects. So you move the render objects around [in] an octree structure, and it's just sort of an efficient way to get to a pass and figure out what objects are in what area pretty quickly.
“It's more rigid – you start with a certain size and certain device, down, and down, and down. It's not as flexible or movable. This is what the zone system does – imagine having a planet, and this planet has a city and other people on it. The zone system – really, what we call the 'zone container' – holds all that. If that planet is orbiting, moving throughout space, what would happen in an octree? An octree is fixed – that would mean the planet and the objects are all transitioning from one area of the octree to another area of the octree. You'd have to be moving them around, you'd be sorting them inside your octree all the time. You'd be sorting thousands of objects. With us, that's not the case because we have these zone containers, and the zone containers contain their own frame of reference, so it's basically like a Russian doll – we contain things within things. At the top level, we move the top level around. You don't have to move anything inside that top level because they're still all relative to each other perfectly.”
That's a lot of data packed into a dense space, but the logic isn't hard to follow once read carefully. The zone system, as we've already discussed, allows containers of data to be shuffled at a very top-level, so the entirety of the data set doesn't have to be checked and moved as would be done in a more normal octree deployment. This reduces load on the system and, more critically for Star Citizen, the network and servers. Reducing the number of moving data to a smaller subset of pointers benefits the load potential (ultimately, player count) of each instance within the game.
An octree works well with more “normal” types of games, like an FPS, where the frame of reference is a large group of fixed objects (buildings, map elements, terrain features) with a very small list of movable objects (player characters, bullets). Octrees tell the renderer what needs to be drawn to the camera by culling-out data which is unseen to the player's point-of-view, working efficiently to minimize render calls and bandwidth consumption in the pipeline.
“So [the zone system] allows us to do things like planetary orbits, ships flying around with lots of stuff inside, and not worry about moving them around and fixing their spatial data structure. It's much more useful for our purposes, but it's not normally so useful in a 3D game. Usually, in a 3D game, the structure itself is fixed – walking around a map – the plants, the buildings, they aren't moving. That's fine, because the octree allows you to determine which plants and what part of the building you're able to see really quickly, and the only things moving through them are your character or a bullet or something like that.
“For a typical shooter, most engines use an octree for [the] render pipeline to determine what you see and what you won't see. For us, it's not that suitable – we'd be doing a bunch of work we'd rather not be doing. Even though octrees are fairly efficient at sorting data and moving data around, since you're generally just moving pointers around – when it's a big enough space and big enough data – it does actually take an appreciable time. You can always think of a big spaceship or a planet as a mini-FPS level [in what we do], so it allows for movement of these things without having to have the challenges of a fixed spatial data structure. We can sort of scale the data structure inside each one of the zones to be appropriate for the data we have inside of it.”
To learn more about this system – although it has advanced in development – we'd recommend watching our previous interview specifically on zoning. Find that below this block.
If you like this type of content, you can help us in our production either directly, via Patreon, or via interview suggestions. If there are any game developers you'd like us to speak to, tweet us your suggestions and we'll try to arrange a discussion!
- Steve “Lelldorianx” Burke.