Tuesday, March 16, 2010 at 4:26 AM |  
Local lights are the main factor that made deferred lighting so appealing. The fact that they are rendered in screen space allowed the process to be fillrate dependent instead of vertex bound; where in the forward renderer, for each light lit on a geometry, we have to send the whole geometry to the GPU for rendering.

However, deferred lighting incurs fillrate issue that is typically encountered with particle effects. In the naive approach, one would render quads on the screen representing each light. This method though working well with small attenuating lights does not work well at all for lights that covers a huge volume of the screen space. Hence we would like to come up with a solution of optimal lighting where only affected pixels are rendered.

The answer to that problem is light volume rendering. Instead of rendering quads on the screen, we render volume meshes that represent the lights; sphere for point and cone for spot. Being that they are real true mesh rendered in the scene, we could utilize the Z-Buffer depth test to cull areas that are occluded (Figure 1 - A). Unfortunately, this is not optimal most of the time. With only simple volume mesh rendering, surfaces that are not affected behind the volume mesh will get uselessly rendered as well. Another way about this is to render the backface of the volume with a "greater equal" depth test (Figure 1 - B). This method would cull off unnecessary surfaces that are behind the light volume. However, it does that at the cost of never culling for surface in front of the light volume. Hence such naive methods do not work well at all for what we are trying to solve.

To illustrate, lets give a 2D visual of a given scenario:

Figure 1 - Naive light volume approaches

Interestingly, there are a few methods to solve this and they all uses the stencil buffer. The idea in fact came from the good old stencil shadow volume rendering technique. There are many articles out there on this topic if anyone is interested. However, to summarize, these are the two more popular technique used: Depth fail hybrid and XOR stencil technique.

Let's start with the first. To understand this technique, we need to first take note of one interesting fact. If you look closely at Figure 1, you would have realized that to get the desired result, we could combine the two naive approach with an AND logic operation!

Figure 2 - AND logic operation

Now with that in mind, lets talk about how the depth fail hybrid technique works. The reason why I called it a hybrid technique was that instead of the full depth fail stencil volume technique which uses two stencil render passes, the technique applies the first depth fail stencil pass and then combine the second depth test pass with the lighting render while applying the generated stencil mask. Effectively this is equivalent to the AND logic operation but with better fillrate optimization due to using one stencil pass instead of two (I'm referring to stencil shadow).

Figure 3 - Z-Depth fail hybrid

Lets get on to the second technique. This is actually a less common technique due to it being more fillrate intensive compared to the previous technique. The concept of this technique is to flip the stencil value without triangle face culling. Hence, it pretty much mimics the XOR logic operation. Truth be told, this technique could be done in two ways; Depth Pass(Figure 4 - First row) or Depth Fail(Figure 4 - Second row).

Figure 4 - XOR logic operation

From what I gather, the latter implementation seems to be the implementation of choice just as the Z-Depth fail technique. If I understood it correctly, it is due to typical scene structure where camera views are mostly looking at sparse areas instead of having plenty of near eye occlusions. Hence it will optimize off some fillrate in the stencil pass. Also, an interesting note is that doing a Z-Depth fail technique for this XOR method actually avoid the problems of when camera is within light volume.

Having said that, the light volume stenciling technique obviously does not follow stencil shadow techniques. Due to this, the two mentioned technique actually assumes some limitations.

For Z-Depth fail hybrid technique, the system assumes enclosed convex light volume. The usage is also not applicable when camera is inside the light volume (Frontface is behind camera resulting in no lighting).

For the XOR Z-Depth fail technique, the system assumes enclosed light volume. However, due to it's nature of XOR operation, the light volume can be concave type as long as there are no intersection issues. This is an interesting behavior as it means we could have specially controlled light volumes that will not bleed into areas we do not want!(Think point and spot lights that has special light volume to avoid bleeding to adjacent rooms) Adding on to that, as mentioned before, this technique do not suffer from the camera in light volume glitch. Obviously there's always a catch when things look so good. The fact that we have to turn off backface culling during the stencil pass means that our stencil rendering is more expensive.

Phewh~ this is a long topic and I'm still not done yet. To think most reference slides describing these methods I could find online discussed them in two slides. O_O Anyways, to conclude for this post, I've outlined two methods that I will integrate in our deferred lighting pipeline. My next blog post will talk about when I would use them. ;-)
Posted by Lf3T-Hn4D


Mirko said...

Hi Lf3T-Hn4D,

I´ve two comments on this.
First of al thanks for your comprehensive explanation of this problem :-)

1. While thinking about this problem I wonder why it is not feasable to tackle the issues with discarding the fragments of the light geometry when they are drawn (front face). I fear I´m missing something here but why don´t you check in the light geometry fragment program if the underlying fragment world position is outside the light volume and discard this fragment then directly without involving stencil tests?

2. Did you already address the precision issues which seem to appear when the camera is located and the bounds of a light geometry?

March 25, 2010 at 1:56 AM  
Lf3T-Hn4D said...

Hey Mirko,

To answer your questions:
1. The reason is that we want to avoid going into fragment processing when not necessary. The issue with fillrate is the problem of having to execute the fragment shader, which is slow. So if we could utilize the early Z test to avoid it, it will help a lot. It's always faster compared to trying to sample position from the G-Buffer and compare depth in fragment shader. Obviously this is only true if your light volume covers a huge part of the scene. If it's only in a small region, it's better to just batch the lights together and render with front face since the stencil state switching might just slow things down. Hence there's no real end all solution here. You have to use different techniques for different cases for optimal performance.

2. Yes, I did. I simply did a rough estimate of the light volume vs near clipping plane test. For point light, I do a simple sphere plane intersection test. For spot light, I first do the sphere plane test. When it intersects, I do a sphere cone test where sphere is the estimated camera bound with near clip(calculated from the near clip frustum corner). When these passes as in intersection happens, I simply skip stenciling and render light volume with backface (cull frontface) and reverse Z test.

March 25, 2010 at 2:19 AM  
Mirko said...

I understand. I wasn´t aware of the performance differences of fragment programs and stencil tests.

Your initial tests for light geometry clipping are quiet complex. Somehow I hoped deferred shading would eventually be easier to implement especially with the lights drawn as geometry. But I see that there are a lot of things which can be avoided when carefully checked first.

Thank you for the insights!

March 25, 2010 at 2:06 PM  
Lf3T-Hn4D said...

You're welcome. :-) The light geometry clipping test isn't really that complicated. Especially for the sphere plane intersection test, it's actually quite cheap. I would believe it's worth the while if we can gain more efficient fill rates. Besides, you most likely want to only test this for lights that are covering a big portion of the screen. So not all lights will be using this test. However, if you feel this is just too much for CPU processing, you could always just do simple bounding sphere test for both light type. It'll have more false positive cases but probably wouldn't matter since when that happens, the light is probably near enough to the camera that the stencil culling does not help any more. This is just a speculation though. We'll never know unless we do a proper test. This is probably a good topic for a thesis. :-)

March 25, 2010 at 2:43 PM  

Post a Comment

Liquid Rock Games and Project Aftershock. All Rights Reserved.