Tuesday, July 19, 2011

Occlusion culling

The best way to speed things up is to send, load, and draw as little as possible. Hexahedra now uses hardware occlusion culling at the client side to determine which chunks to fetch over the network and show to the player.

Hardware occlusion checking is done by drawing the scene first, and then pretending to draw a box (or any convenient primitive) around whatever you want to cull. The GPU then tells you if this box would have been visible or not, had it actually been drawn. If it is visible, you can then proceed to load and draw the actual million-triangle object as usual.

How not to do it

It would be nice if we could check if the 16x16x16 bounding box of a chunk is visible, before we grab it from the server, decompress it, make a mesh, and push it to the graphics card.

We run into a couple of problems right away. First, to make sure the check is useful, the chunks that could potentially occlude it must be drawn first. For a chunk somewhere straight in front of the player, there's only one potential occluder. (For example, the chunk at {2,0,0} is only covered by the one at {1,0,0}.) Most chunks will have 2 or 3 potential occluders, though. And to determine whether to draw them or not, we'll have to check their occluders first...

Technical complications aside (chunks cannot be turned into meshes until their six neighbors are loaded, both the GPU queries and network queries are asynchronous, etc.) it is easy to see this is tricky to get working. And it doesn't work that well either. Suppose we have a chunk A, and three neighboring chunks B, C, and D that are A's potential occluders. B and C are visible, but they're on the edge of the screen. D is just off the edge of the screen, so we can't do a visibility check just yet. But until it becomes visible, we can't issue the query for A. (Well, we could,  but that would result in way too many queries and loads of false positives.)  However, if B and C don't cover A completely, we will see an ugly gap until the player turns his head far enough for D to enter the screen.

So that's where I lost a lot of time last month. :P

Better occlusion checking

The solution is not to check the bounding boxes, but the faces that touch the chunks that are already visible. Here's an illustration of how it is implemented right now:

Step 1: the player is standing in this chunk, which is of course visible to the player. Six queries are issues, one for every face of this chunk. (In the illustration you see only four). The queries are shown as lines. The dotted lines are outside the player's view, so we can't resolve them yet. The solid lines, however, will be answered by the GPU soon.

Step 2: The GPU has told us both faces are visible, so we fetch the corresponding chunks from the server, and draw them. For the new visible chunks, we issue new visibility queries. There are now four queries underway, and four are on hold.

Step 3: Some of the queries of chunk 1 have returned, and two faces are invisible. We mark the chunks as occluded, and we don't load them or issue any further queries for them.

Step 4: More data comes back, but in the meantime the player turns his head to the right. The face on the right enters the screen, and the query becomes active.

Step 5: The face is visible, and chunk 2 is loaded and drawn. Three new queries are sent to the GPU, including the one for the face that borders the occluded chunk above it.

Step 6: That face is visible, so chunk 3 is now marked visible as well. The other two faces were occluded.

Coming up

It appears there is still a problem where I get false negatives back. This is annoying, because the terrain shows a hole at that point. Here's a screenshot with one of such hiccups. The queries are drawn in green, for debugging purposes.

Usually a shake of the mouse solves it, and the algorithm chugs along as if nothing happened, but it's really annoying and ugly.  Also, I need to integrate the heightmap, so I can skip air chunks. (As you can see, there's a lot of unnecessary checking going on up in the sky. This will speed things up tremendously.)

Once that's taken care of, I hope to be able to post some videos showing all this in action, and also some statistics of just how much memory and triangles this saves under realistic conditions.

1 comment:

  1. It seems to me like you can save a trip to the GPU somewhere by doing an opacity check on the 16x16 blocks on the face of a chunk. Since you have to have the 6 adjacent chunks loaded to get the meshes, you should be able to trivially reject one or two of them this way.