Sunday, February 12, 2012

Flying through ModernCity

The whole "send surfaces only instead of whole chunks" bit now works properly, so I thought I'd show you guys a video.

(Omagosh I'm allowed to embed Youtube videos again? yay!)

It seems the screen recorder didn't only kill the frame rate as per usual, but also messed with the occlusion queries. So you'll see the occasional hole in the terrain. Let's see if I can figure out why this happens...

Edit: Fixed it by changing the occlusion system. We'll leave that for another blog post. In the meantime:

Sunday, February 5, 2012

How to draw a cube

I'm not the only one who tried to draw a couple of cubes after seeing Minecraft, and every now and then I get questions from other programmers about how to structure the data and make it fast. So let's take a look at the basics of cube rendering.

Structure and storage

At the bottom level, we have a block, which is nothing more than a 16-bit value.  This means we can have over 65,000 block types, although special block types could take up multiple slots.  (For example, a staircase that can be placed in four directions would require four ID numbers.)  There is a separate table that holds all the info for a given block type, such as the name, durability, transparency, the collision box, Lua hooks, etc.

Then there's the chunk, a group of blocks arranged in a 16×16×16 cube.  Or at least, that's the concept.  In reality, it is an array of block types, and its size is fixed at 16³ = 4096 elements.  (At 16 bits for a block type, that's 8192 bytes).  The (x,y,z) position of a block is implicit: it is located in the array at index [x + y * 16 + z * 256].

Then, at the highest level, there's the game world.  This is a really, really big cube (4.3 billion blocks in every direction: a few times larger than our own sun), and it's sliced up in chunks.

Unlike blocks, we don't have to keep all 19 septillion chunks in memory at all times.  Chunks are positioned explicitly, meaning we store its position as well.  This allows us to load and unload small sections of the game world as needed.  I store them in a database, where the position is the primary key, and the chunk data is a blob of compressed data. Until recently it used 'deflate' as its compression algorithm.  I switched to LZ4, because it is much faster (7 to 10 times) and compression is only a few percent worse on average.


Drawing the cubes on screen seems simple enough.  Load the chunks that are near the player, then go through every block in every chunk.  Grab the right texture for that block type, draw a cube at this position (the chunk's own coordinates, plus the implicit position of the block). Ta dah.

This is really, really slow.  We're drawing a lot of stuff the player can never see, so this is also easy to speed up.  The only thing that is interesting for rendering, are the faces at the surface.  So we still check every block, but we also look at the six neighbors.  If the neighbor is of a different block type, and it is transparent, then we draw the face.  Surfaces are stored as a list of a 16-bit chunk array index, plus a byte that indicates which of the six faces are exposed in the lowest 6 bits.

Much better already!  Combined with frustum culling and occlusion culling (described in detail in an earlier post) performance will be boosted even further.

Opaque and transparent objects need to be drawn separately, so you'll have to generate two surfaces for each chunk.  First you draw all opaque surfaces, in near-to-far order.  Then you turn off z-buffer writes, and draw all transparent surfaces in reverse order, far-to-near.

Surface extraction

The step that converts a chunk to the visibile surfaces is now our new bottleneck.  I know others have experimented with offloading this to the GPU, but this requires a rather modern card, and I'd like to support at least OpenGL 2.1.  So for now, we're still CPU bound.

Hexahedra always runs in a client-server setup, even for single player.  The server also looks for a chunk's surface, even though it doesn't render anything.  It uses this for calculating light, and for collision handling. Now why don't we store this information in the database as well, and send only the surfaces to the client?

This has several advantages.  Clients don't have to go through the same expensive step anymore. Network load will be reduced; a surface is usually smaller than the chunk it was generated from, with only a couple of worst-case scenarios as the rare exception.  Cheating will be harder, since clients cannot see ores or other valuables hidden inside the walls.

Hope you enjoyed this quick peek under the hood!