The Vulkan build is steadily getting more and more functionality. Now the core rendering pipeline in SceneEngine is working – which means we can have deferred lighting, shadows, IBL, tonemapping, etc. Simple scene should render correctly now. But there are some inefficiencies and issues (see below).
Unfortunately the DirectX11 version isn’t working at the moment. This is all in the “experimental” branch.
Declarative render passes
Tone-mapping now works, and it was a good prototype for the “declarative” render pass model. This allows us to specify render passes and render targets required using a “description” structure. The system will do some caching and correlate request to resources, creating and binding as necessary.
There is some overhead with this design because it involves doing some per-frame hash and lookups as we go along. It’s not as efficient as (for example) just pre-creating all of the frame buffer / render pass objects in a configure step. However, this design is maybe a little more flexible and easier to tie into existing scene engine code. In effect, we’re building a new layer that is just one step more abstract from the underlying Vulkan objects.
I’ve pushed some of the “busy-work” (like declaring subpass dependencies) down into the RenderCore::Metal layer. This makes the interface easier to use… But the downside is that my abstraction is not expressive enough for some unusual cases. For example, I came across a cases where we want to bind the “depth & stencil” aspects of a depth texture in one subpass; and in the second subpass only the “stencil” aspect is bound. This apparently needs a dependency… But it’s just really inconvenient with this interface.
I’ve also build a concept called “named resources” into the Metal::DeviceContext. This allows us to get TextureViews for attachments from the device context. It feels out of place because it’s an operation that doesn’t involve the hardware, but there doesn’t seem to be any better way to handle this case.
Fundamentally we want to define attachments FrameBufferDesc objects, so that we can later refer to them again by binding id. It would be better if some of this functionality was in the RenderCore::Techniques library… But it would be just too much hassle to split it better Techniques and Metal.
Anyway, it’s working now in Vulkan. However, I still haven’t got to the caching and reuse part. And it also needs to be implemented for DirectX11, also!
Compute shader work!
I added in support for the compute pipeline. It actually was pretty easy. I decided to switch some of the tonemapping code from pixel shaders to compute shaders – because this seems to be more natural in Vulkan. Working with viewports and render targets is much more complex in Vulkan than DirectX11.
Render state objects working & dynamic pipeline objects
Now all of the render state objects (like BlendState, SamplerState, etc) work. However all of the “pipeline objects” are dynamically created as needed. Vulkan allows these to be pre-calculated and optimised and load time. This is pretty smart, because a lot of that render state information can just become shader instructions and combined into the shader code.
I think the SharedStateSet techniques might be able to precalculate pipeline objects. But a lot of pipeline objects will have to be dynamically created like this. But perhaps I can use the “inheritance” stuff in pipeline objects to calculate some stuff earlier (for example, in ShaderPrograms).
For the moment, pipeline objects are just created and recreated like crazy!
Separated Samplers and Textures in HLSLCrossCompiler
I found that GL_KHR_vulkan_glsl added support for separate Sampler and Texture objects to GLSL. So I added support for this to the HLSLCrossCompiler! This is a huge help, because otherwise it would be a hassle to work around the standard HLSL model of separate objects.
Many fixes in HLSLCrossCompiler
As I go through, I’m finding issues in the HLSLCrossCompiler. It’s quite strange because the compiler supports so many instructions and modes… But they there are certain things that are just incorrect. In particular, I’ve had some fixes to certain swizzle operations, and made some fixes dealing with constant buffers.
It’s really quite fascinating because this cross compiler path works so well for so many shaders… I’m really using a lot of HLSL features in XLE, and I’m often happily surprised by the cross compiler just “figuring it out.” But then suddenly some shader will use an instruction in a slightly different way, and everything falls apart.
Anyway, it’s getting more reliable as I go along.
Better model for descriptor sets
I finally settled on a much better model for descriptor sets. I borrowed some ideas from the DirectX12 “root signatures” – these allow us to creating a mapping between the linear register mapping in HLSL (eg, register(t2), register(b3), etc) and a system of descriptor sets and binding points.
So I’ve re-purposed that idea for XLE. The root signatures actually also defines the “descriptor set layouts.” This allows us to create a few global layouts, and reuse them for all shaders. This seems to be the best way.
We have 4 descriptor sets available; I’m using one for BoundUniforms, one for “dynamic” bindings (not-BoundUniforms) and one for global resources that remain bound the entire frame. And so there’s one left over; maybe it will be used for input attachments.
There’s still some thrashing that can occur. But this design seems reasonable. I may expand the interface for BoundUniform that will allow for every efficient use.
More to come…
So it’s working! But there’s still a lot to come. Some things got broke while changing things, and there’s still a lot of perform improvements to make. But it’s looking ok so far.