As mentioned, previously, I pointed out two method to render volumetric lights. I have successfully implemented the first technique for both point and spot lights. Here's a screen shot of it in action(4 point lights & 1 spot light):

Notice the frame rate with my crappy hardware. This is with shadow, HDR and refraction on. Previously, when we were still using forward rendering, I get about the same FPS at ~40. This has proved my speculation was right that when all features are on, I will not see any significant frame rate changes. In fact, we gain the ability to do cheap lighting! The old renderer would have done a horrible job to produce the screen shot above. Hence, deferred lighting FTW! :)

However, since the current method is rather brute force in that all lights will use stenciling technique, this means we might hit a limit quite quickly(probably around 30+ lights). To solve this issue, I have plans for light batching. The idea is to group lights that only encompass a small area in screen space. Then we use shader instancing to render the lights without the stencil pass. Obviously this will end up rendering some redundant portion of the screen. However, I do believe the optimization on less render state changes will offset this cost. As to what is the right screen ratio, I currently have no clue. I probably need to build a test level with tons of lights to figure out. :)

As for the second technique I mentioned during the last post, I intend to use that for custom non convex light volumes. This method allows me to scrap the light group masking idea I had which not only complicates rendering, it also gives the level designers a lot of headache. The goal for this custom light volume is to allow level designers to define light volumes that will not leak into areas they do not want to due to no shadowing; for example spot light casting across adjacent room.

So there, deferred lighting conversion, success! Woot! :-D I must say I have learned a lot over the course of attempting this rendering "upgrade". Now I finally understood the tricks of getting view space position from depth buffer. I was having some trouble especially on figuring how to generate proper far clip plane coordinates with a volume mesh on the screen. But thanks to nullsquared the genius, he solved it for me in a forum post I found. Still, let's not get into the details less I become too long winded; which I already am. :-P That's all for this post.

Posted by Lf3T-Hn4D
Local lights are the main factor that made deferred lighting so appealing. The fact that they are rendered in screen space allowed the process to be fillrate dependent instead of vertex bound; where in the forward renderer, for each light lit on a geometry, we have to send the whole geometry to the GPU for rendering.

However, deferred lighting incurs fillrate issue that is typically encountered with particle effects. In the naive approach, one would render quads on the screen representing each light. This method though working well with small attenuating lights does not work well at all for lights that covers a huge volume of the screen space. Hence we would like to come up with a solution of optimal lighting where only affected pixels are rendered.

The answer to that problem is light volume rendering. Instead of rendering quads on the screen, we render volume meshes that represent the lights; sphere for point and cone for spot. Being that they are real true mesh rendered in the scene, we could utilize the Z-Buffer depth test to cull areas that are occluded (Figure 1 - A). Unfortunately, this is not optimal most of the time. With only simple volume mesh rendering, surfaces that are not affected behind the volume mesh will get uselessly rendered as well. Another way about this is to render the backface of the volume with a "greater equal" depth test (Figure 1 - B). This method would cull off unnecessary surfaces that are behind the light volume. However, it does that at the cost of never culling for surface in front of the light volume. Hence such naive methods do not work well at all for what we are trying to solve.

To illustrate, lets give a 2D visual of a given scenario:

Figure 1 - Naive light volume approaches

Interestingly, there are a few methods to solve this and they all uses the stencil buffer. The idea in fact came from the good old stencil shadow volume rendering technique. There are many articles out there on this topic if anyone is interested. However, to summarize, these are the two more popular technique used: Depth fail hybrid and XOR stencil technique.

Let's start with the first. To understand this technique, we need to first take note of one interesting fact. If you look closely at Figure 1, you would have realized that to get the desired result, we could combine the two naive approach with an AND logic operation!

Figure 2 - AND logic operation

Now with that in mind, lets talk about how the depth fail hybrid technique works. The reason why I called it a hybrid technique was that instead of the full depth fail stencil volume technique which uses two stencil render passes, the technique applies the first depth fail stencil pass and then combine the second depth test pass with the lighting render while applying the generated stencil mask. Effectively this is equivalent to the AND logic operation but with better fillrate optimization due to using one stencil pass instead of two (I'm referring to stencil shadow).

Figure 3 - Z-Depth fail hybrid

Lets get on to the second technique. This is actually a less common technique due to it being more fillrate intensive compared to the previous technique. The concept of this technique is to flip the stencil value without triangle face culling. Hence, it pretty much mimics the XOR logic operation. Truth be told, this technique could be done in two ways; Depth Pass(Figure 4 - First row) or Depth Fail(Figure 4 - Second row).

Figure 4 - XOR logic operation

From what I gather, the latter implementation seems to be the implementation of choice just as the Z-Depth fail technique. If I understood it correctly, it is due to typical scene structure where camera views are mostly looking at sparse areas instead of having plenty of near eye occlusions. Hence it will optimize off some fillrate in the stencil pass. Also, an interesting note is that doing a Z-Depth fail technique for this XOR method actually avoid the problems of when camera is within light volume.

Having said that, the light volume stenciling technique obviously does not follow stencil shadow techniques. Due to this, the two mentioned technique actually assumes some limitations.

For Z-Depth fail hybrid technique, the system assumes enclosed convex light volume. The usage is also not applicable when camera is inside the light volume (Frontface is behind camera resulting in no lighting).

For the XOR Z-Depth fail technique, the system assumes enclosed light volume. However, due to it's nature of XOR operation, the light volume can be concave type as long as there are no intersection issues. This is an interesting behavior as it means we could have specially controlled light volumes that will not bleed into areas we do not want!(Think point and spot lights that has special light volume to avoid bleeding to adjacent rooms) Adding on to that, as mentioned before, this technique do not suffer from the camera in light volume glitch. Obviously there's always a catch when things look so good. The fact that we have to turn off backface culling during the stencil pass means that our stencil rendering is more expensive.

Phewh~ this is a long topic and I'm still not done yet. To think most reference slides describing these methods I could find online discussed them in two slides. O_O Anyways, to conclude for this post, I've outlined two methods that I will integrate in our deferred lighting pipeline. My next blog post will talk about when I would use them. ;-)
Posted by Lf3T-Hn4D
I have completed porting directional shadow casting and refraction to the new system. This means we now officially have all the original features running with a deferred lighting renderer. A test run with all features on (single shadowmap, HDR and Refraction) got me an average of 40fps at 1024x768. Overall, that's pretty much a 10fps dip from previous test. This is mostly due to the shadow casting cost which shows how expensive shadow rendering is. I'm a bit disappointed with this since it makes PSSM useless even on fast machines. I'm planning to have a way to allow artist to define non shadow casting batched entities in the editor to narrow down shadow casting to where it matters. Hopefully this will improve shadow casting performance.

Aside from that, I've also changed the depth G-Buffer format into PF_FLOAT16_GR to add one more channel for material ID. I came to the conclusion of needing this due to the limitation of any deferred renderer which is having different materials with different lighting properties. The main reason I added this functionality so soon was that our tree leaves and grasses are using custom shading. This limits us from utilizing the deferred lighting system to lit them. It would be fine for the first level since there's only one light through the whole scene. However, we have plans for night scenes in the future which wouldn't work very well then. Hence with material ID introduced, the custom shading for leaf and grass is now done in the deferred stage. Obviously this potentially allow us to extend it even more into other types of materials like cloth.

Unfortunately since we are trying to limit the G-Buffer's fatness, we're limited with the number of channels we have to store data. This means that any material type that requires extra info cannot be integrated. Unless we introduce 64bit buffers, this is not possible.

At any rate, with this done, our artist can now happily do outdoor night scenes that has lights affecting grass and tree leaves. My next move is to get the major feature of deferred lighting in: spot and point lights. Since this blog post is already getting long, I'll leave my local lighting thoughts to my next post. So stay tuned. ;-)
Posted by Lf3T-Hn4D
I finally got to the stage of being able to render our first level using the deferred lighting renderer. It is still not complete yet however. Only the default materials required for level01 has been updated. Also, due to this changes, refraction needs to be redone in a different approach.

Along the way, I actually hit a few problems. First one was due to my mistake. When designing the G-Buffer, I assumed I will be able to construct the lighting pass with just the depth and normal. I was wrong. I needed the specular power value as well. Hence, I had to go back to the drawing board again and decided to go for the least accurate model of R8G8B8A8 where we store compacted view space normal map in the RGB channel and spec power in the Alpha channel. Interestingly, it turned out pretty good. So much for the "not accurate" crap mentioned in the CryEngine3 power point presentation slide. Personally, the inaccuracy is not really distinguishable. Besides, with proper good textures provided by artists, this small error isn't really a big deal; especially considering that we are not trying to achieve realism. What we want is beauty and style. :-)

Another of my issue was the way Ogre did its rendering. For every render target, Ogre would do a scene traversal to find all visible renderables and render them. This I found unacceptable. Reason being that it means Ogre would traverse the scene at least twice; First being G-Buffer stage, second being the final compositing and forward rendering stage. This is a waste of cpu resources hence I ended up listening to the render queue event during the G-Buffer pass and keep my own copy of all queuing renderables. Then I manually inject the render queue during the final compositing stage with a custom subclassed SceneMgrQueuedRenderableVisitor that tries to refetch the right material technique base on the new material scheme.

And the end result? I had our first level running at ~40-70fps with an average of 50fps at 1024x768. This is with HDR on but without Shadow. Not too bad for a crappy 8600GTS.

Oh, one thing interesting to note is that since the G-Buffer stage does not require much texture sampling, it actually renders really fast. That being so, and because we keep the Z-Buffer intact throughout the whole process, we actually gain some performance during the final compositing pass due to early Z-out. So you loose some, you win some. :-P (In theory, if you do a pre Z-only pass before filling the G-Buffer, you might speed up more if your scene is complex. But it will also increase batch count. So I'm not too sure if it's worth while.)

Unfortunately for us though was that because we did not planned to have deferred lighting from the start, we had to abide by some bad decisions done in the past. One notable issue was that our G-Buffer stage requires the diffuse, normal and spec map in the worst case scenario. This is because in the case of an alpha-rejection shaded material, we need to sample the alpha channel of diffuse for alpha-rejection, and specular power from the spec map. This means that we are sampling at least two textures for each material during the G-Buffer stage. This is not ideal as we should try to sample as little textures as possible in this pass.

That said, if I could fix this, I would make specular power part of the normal map's alpha instead of in the spec map's alpha; making only one texture sampling needed typically in the G-Buffer stage. This would also leave an extra alpha for reflection factor in the spec map for envmap reflective materials; which would be a win win solution(less one reflection factor texture map). Sadly, we're already a long way in art asset creation. Changing this now would mean loads of work fixing the old textures and materials.
Posted by Lf3T-Hn4D
Liquid Rock Games and Project Aftershock. All Rights Reserved.