I've begun working on light buffer generation. While tackling this, I realized my normal buffer generation isn't accurate. I'm using the view space normal compressed with only XY where Z is reconstructed. While using 16bit float keeps accuracy pretty well, we loose the Z's sign info and this does not bode well for us. The reason is that while for the most part Z is always positive facing, there is bound to be edge cases where it is not. Especially when we are using normal map to distort a surface's normal. So I begun a search to find if I could improve my normal buffer rendering. To my surprise, someone by the name of Aras did the homework for me. Thanks Aras! :)

Just so you know, I'm not a math junkie. So deciphering what was done within that page took me quite a good while. That said, I decided to pick two method out of the selection to test out; Stereographic Projection and Cry Engine 3 method. From what I understood, Cry Engine 3's method sacrifices the X, Y's accuracy for a very good Z value. Comparatively, Stereographic Projection produces more even distortion. However, since we're using two channel 16bit float g-buffer format, both data loss is pretty insignificant. After some testing, I came to the conclusion that both method produce almost 100% similar lighting. The difference is unnoticeable to the naked eye. Hence, I decided to use the cheaper of the two, which happens to be Stereographic Projection.

So.. Normal G-Buffer fixed!

Now, as we know, shadow casting is a very expensive process. So for the most part, we would want to avoid shadow casting. However this leads to one problem; light leaking. Given a light in a room, if light is not applied with shadow casting, the light will leak through the wall into the adjacent room. This becomes an ugly behavior. So how does one fix this? When using forward rendering, we had the flexibility of doing selective light masking. This worked very well for the case mentioned, but how do we bring it to the deferred model? From what I see, my best bet is to use the stencil buffer. However, I'm not quite sure how cheap stencil buffers are. From what I read, clearing stencil buffer is expensive. Also, stencil depth is only 8bits for any modern cards (being that the other 24bits are for depth buffer). Also, for fillrate optimization, we would want to mask out the depth bound of lighting volume for each local light we render. Hence, that would take up 1 bit of the stencil buffer, leaving only 7 for light grouping.

I am not familiar with stencil buffer and do not know exactly what kind of cost it would impose on rendering. But writing and switching stencil ref value during the G-Buffer rendering phase seems scary to me. I will have to keep setting the stencil ref value for each Renderable being rendered. From what I gather, the best way for me to implement this is to add a RenderObjectListener during the G-Buffer render state, then set the stencil ref value for each renderable in the callback. Then after G-Buffer mode is done, remove the listener. However, there is one issue with this. Since the callback is dealing with Renderable, I have no way to find out the light mask for that Renderable since the light mask is set in the MovableObject. As of now, I have no clue how to handle this.
Posted by Lf3T-Hn4D
I started working on our deferred lighting / light pre-pass / semi deferred / or whatever you wanna call it renderer. Right now, all it does is just generate the appropriate G-Buffer.

As simple as it looked, it wasn't so for me since our engine have pretty complex mix of custom materials. I had to restructure the way we sort our render queues to filter out unwanted meshes from the G-Buffer pass. One draw back of our current system was that our models consist of both shaded and unshaded materials; E.G. Buildings with holographic ads. The other was the AO boxing overlay that shares vertex buffer with the mesh that is being overlayed. So in the end, I had to hijack them away during the renderableQueued() callback stage. This method feels ugly though. Unfortunately, I can't think of any other way to solve this. Especially since the problem is mostly about filtering submeshes, so there's no way to use the visibility flag in Ogre.

Nevertheless, it's done and I'm beginning to see how to build this renderer in the most optimum way possible(in Ogre). I must admit that the reason why I am working on this now is all thanks to the Ogre dudes. Without the recent added features, I probably wouldn't be working on this now. So lemme give my thanks to Noman and dark_sylinc for making this possible. :) Oh, I should also thank Google too since they technically funded Ogre's compositor improvement.

As always, I like posting some graphics since I'm a graphics person. So, without further ado, here they are:
G-Buffer normal:

G-Buffer depth:

and since the depth buffer looked so strangely satisfying, here's two more just for kicks.

Don't you think they set the post-apocalyptic mood? :-)
Posted by Lf3T-Hn4D
As we progressed with the building of our second level, we hit one major issue with our rendering solution; lights. Since level 2 is an underwater level, it requires tons of lights to set the mood. Our current rendering method is a forward rendering system. This means for each mesh lit by a light, an additional pass is needed to draw the affected mesh. This becomes a problem as the GPU will be re-rendering meshes for each light.

To solve this problem, most games uses pre-baked light map for this case. However, we opted not to rely on this method as our tool chain is designed for more dynamically build levels (Levels build with prefabs put together). Also, we prefer to have a dynamically lit scene where we can freely move lights around. We believe it actually would save us more time compared to baking lighting which will only slow down the level building.

Having said that, I've decided to tackle this problem the deferred way. However as we know, full deferred renderer has many caveats. They're problematic with different materials and expensive due to sampling of 4 render targets per light. Thankfully, there appears to be a simpler method known as the light pre-pass rendering method, introduced by Wolfgang Engel.

The light pre-pass renderer basically does a lighting pass that accumulate all lighting to a light buffer without the diffuse/albedo part of the scene. The idea is to use less render textures (RT) hence less texture sampling during the lighting stage.

So I began to draft out how I would build my G-Buffer and handle the rendering pipeline.

Bellow is how I plan to layout my G-Buffer:

Note that RT2 is optional and only generated when we want to have full scene motion blur.

Then, to implement the light pre-pass rendering pipeline into our existing shading pipeline, this is what I'm planning to do:

An interesting thing to note is that this method does not incur much extra memory compared to the deferred approach. In fact, the additional memory required for G-Buffer is pretty much already a required memory size when enabling refraction in our current pipeline. Hence, refraction effect pretty much becomes almost free in terms of memory needs.

Obviously this is only just one side of it. On the other side of things, we actually need a light buffer and this is where things start to cost more. For the sake of HDR, we would need to use PF_FLOAT16_RGB or PF_FLOAT16_RGBA. Both format here is actually a 64bbps format which means double the cost of the normal render buffer. For an accurate lighting model, we would need two render target. One for N.L phong lighting, the other for N.H specular lighting. So we would impose extra memory requirement to build a perfect specular lighting model. Annoying. Hence, I'm planning to have an approximate specular mode where we only keep specularity as a luminance value. This way, we only use one light buffer and we leave the specular colour lit base on diffuse light and the material. It shouldn't look too bad I hope.

Aside from that, this new method does introduce a problem. The opaque part of the scene has to be rendered twice. Once for filling the G-Buffer, and second time for the actual material and light composition. This basically means that the initial rendering would not be faster compared to our forward rendering approach. However, it will impose very low cost to having additional dynamic lighting. Personally, I hope this is a correct direction to head for. I do know cards in the range of 8800GTS (which is medium range by now) can handle this no problem. However, lower end cards would suffer, E.G. 9600GTS - which is what I am using for testing. It's already trudging along with all features on at 1024x768 doing 30fps or less which is bad for a racing game.
Posted by Lf3T-Hn4D
Liquid Rock Games and Project Aftershock. All Rights Reserved.