Sunday, February 12, 2012

Flying through ModernCity

The whole "send surfaces only instead of whole chunks" bit now works properly, so I thought I'd show you guys a video.



(Omagosh I'm allowed to embed Youtube videos again? yay!)

It seems the screen recorder didn't only kill the frame rate as per usual, but also messed with the occlusion queries. So you'll see the occasional hole in the terrain. Let's see if I can figure out why this happens...

Edit: Fixed it by changing the occlusion system. We'll leave that for another blog post. In the meantime:



Sunday, February 5, 2012

How to draw a cube


I'm not the only one who tried to draw a couple of cubes after seeing Minecraft, and every now and then I get questions from other programmers about how to structure the data and make it fast. So let's take a look at the basics of cube rendering.


Structure and storage


At the bottom level, we have a block, which is nothing more than a 16-bit value.  This means we can have over 65,000 block types, although special block types could take up multiple slots.  (For example, a staircase that can be placed in four directions would require four ID numbers.)  There is a separate table that holds all the info for a given block type, such as the name, durability, transparency, the collision box, Lua hooks, etc.

Then there's the chunk, a group of blocks arranged in a 16×16×16 cube.  Or at least, that's the concept.  In reality, it is an array of block types, and its size is fixed at 16³ = 4096 elements.  (At 16 bits for a block type, that's 8192 bytes).  The (x,y,z) position of a block is implicit: it is located in the array at index [x + y * 16 + z * 256].

Then, at the highest level, there's the game world.  This is a really, really big cube (4.3 billion blocks in every direction: a few times larger than our own sun), and it's sliced up in chunks.



Unlike blocks, we don't have to keep all 19 septillion chunks in memory at all times.  Chunks are positioned explicitly, meaning we store its position as well.  This allows us to load and unload small sections of the game world as needed.  I store them in a database, where the position is the primary key, and the chunk data is a blob of compressed data. Until recently it used 'deflate' as its compression algorithm.  I switched to LZ4, because it is much faster (7 to 10 times) and compression is only a few percent worse on average.


Rendering

Drawing the cubes on screen seems simple enough.  Load the chunks that are near the player, then go through every block in every chunk.  Grab the right texture for that block type, draw a cube at this position (the chunk's own coordinates, plus the implicit position of the block). Ta dah.

This is really, really slow.  We're drawing a lot of stuff the player can never see, so this is also easy to speed up.  The only thing that is interesting for rendering, are the faces at the surface.  So we still check every block, but we also look at the six neighbors.  If the neighbor is of a different block type, and it is transparent, then we draw the face.  Surfaces are stored as a list of a 16-bit chunk array index, plus a byte that indicates which of the six faces are exposed in the lowest 6 bits.


Much better already!  Combined with frustum culling and occlusion culling (described in detail in an earlier post) performance will be boosted even further.

Opaque and transparent objects need to be drawn separately, so you'll have to generate two surfaces for each chunk.  First you draw all opaque surfaces, in near-to-far order.  Then you turn off z-buffer writes, and draw all transparent surfaces in reverse order, far-to-near.


Surface extraction

The step that converts a chunk to the visibile surfaces is now our new bottleneck.  I know others have experimented with offloading this to the GPU, but this requires a rather modern card, and I'd like to support at least OpenGL 2.1.  So for now, we're still CPU bound.

Hexahedra always runs in a client-server setup, even for single player.  The server also looks for a chunk's surface, even though it doesn't render anything.  It uses this for calculating light, and for collision handling. Now why don't we store this information in the database as well, and send only the surfaces to the client?

This has several advantages.  Clients don't have to go through the same expensive step anymore. Network load will be reduced; a surface is usually smaller than the chunk it was generated from, with only a couple of worst-case scenarios as the rare exception.  Cheating will be harder, since clients cannot see ores or other valuables hidden inside the walls.



Hope you enjoyed this quick peek under the hood!



Sunday, September 25, 2011

GLSL

Most people have already seen GLSL in action in several Minecraft mods, ranging from bump- and parallax mapping to bending the world into one giant acid trip. Other games, such as Mythruna, use it out of the box for effects like HDR bloom.

Personally, I prefer Hexahedra's current look, with the simple textures and the blocky lighting. But there are several other interesting things possible that have nothing to do with aesthetics.

One of them is the use of texture arrays. An old technique to speed up rendering is to use a texture atlas: a large texture that combines several smaller ones. The drawback is that such textures cannot be tiled, so every cube has to be drawn separately. Minecraft uses (used?) a texture atlas that is exactly one tile wide, so the textures can at least be tiled horizontally. Strips of cube faces can be merged together, which cuts back on the amount of work the GPU needs to do.


A texture array is a kind of 3-D texture. The individual textures are arranged in a stack. Each slice can now be tiled both horizontally and vertically, so we can merge even more faces. The gains are the most obvious for open water, where 256 cubes with the same texture and lighting can be merged into a single, large square.

And in one fell swoop, it also solves all problems with textures bleeding into each other at the edges because of mipmapping and rounding errors.

There's another advantage. Every corner used to be stored as three shorts (x, y, z) for the position, two floats (u, v) for the coordinates in the texture atlas, and three bytes (r, g, b) for the lighting. That's 68 bytes per face.

Such restrictions no longer exist in GLSL. We can put the coordinates in bytes. (I'd put them in nibbles, but the range is 0-16, not 0-15). Now that we're using texture arrays, we need a short to pick a slice, and two nibbles for the u,v coordinates. Three more nibbles for the lighting: ambient, sunlight, and artificial light. The actual light colors are grabbed from uniforms, global values we need to set only once. So we're down to 32 bytes per face. Nice!

Hmm... The u,v coordinates now move in lockstep with the vertex positions. It turns out that by storing a normal, we can go from world coordinates to texture coordinates by a simple multiplication. We only have six possible normals, so that's three bits. And we're down to 28 bytes per face. Even better, we can now use the normals for the ambient lighting as well. Dusk and dawn are going to look spectacular if we can give the west (c.q. east) faces a red or pink hue.



I look at the top of the screen where it says "20 FPS". Frown. Revert to previous shaders. 60 FPS. The fuck?

Turns out a simple lookup in a table with six 2x3 matrices is painfully slow in GLSL. It might not be worth the extra 4 bytes per face, but it would be neat to have anisotropic ambient light. Perhaps a lookup in a small 1D texture would solve this? I'll have to spend some more time on this.

Tuesday, July 26, 2011

New test data

There has been a lot of activity recently around Cubic Chunks, a Minecraft mod that raises the height limit to 4096. Some of the old yMod maps have been dusted off and converted to this new format. Since it is based on McRegion, it isn't difficult to import in Hexa.

One of those maps is ModernCraft. I just wanted to show some screenshots, because it looks awesome.




Tuesday, July 19, 2011

Occlusion culling

The best way to speed things up is to send, load, and draw as little as possible. Hexahedra now uses hardware occlusion culling at the client side to determine which chunks to fetch over the network and show to the player.

Hardware occlusion checking is done by drawing the scene first, and then pretending to draw a box (or any convenient primitive) around whatever you want to cull. The GPU then tells you if this box would have been visible or not, had it actually been drawn. If it is visible, you can then proceed to load and draw the actual million-triangle object as usual.


How not to do it

It would be nice if we could check if the 16x16x16 bounding box of a chunk is visible, before we grab it from the server, decompress it, make a mesh, and push it to the graphics card.

We run into a couple of problems right away. First, to make sure the check is useful, the chunks that could potentially occlude it must be drawn first. For a chunk somewhere straight in front of the player, there's only one potential occluder. (For example, the chunk at {2,0,0} is only covered by the one at {1,0,0}.) Most chunks will have 2 or 3 potential occluders, though. And to determine whether to draw them or not, we'll have to check their occluders first...

Technical complications aside (chunks cannot be turned into meshes until their six neighbors are loaded, both the GPU queries and network queries are asynchronous, etc.) it is easy to see this is tricky to get working. And it doesn't work that well either. Suppose we have a chunk A, and three neighboring chunks B, C, and D that are A's potential occluders. B and C are visible, but they're on the edge of the screen. D is just off the edge of the screen, so we can't do a visibility check just yet. But until it becomes visible, we can't issue the query for A. (Well, we could,  but that would result in way too many queries and loads of false positives.)  However, if B and C don't cover A completely, we will see an ugly gap until the player turns his head far enough for D to enter the screen.

So that's where I lost a lot of time last month. :P


Better occlusion checking

The solution is not to check the bounding boxes, but the faces that touch the chunks that are already visible. Here's an illustration of how it is implemented right now:


Step 1: the player is standing in this chunk, which is of course visible to the player. Six queries are issues, one for every face of this chunk. (In the illustration you see only four). The queries are shown as lines. The dotted lines are outside the player's view, so we can't resolve them yet. The solid lines, however, will be answered by the GPU soon.

Step 2: The GPU has told us both faces are visible, so we fetch the corresponding chunks from the server, and draw them. For the new visible chunks, we issue new visibility queries. There are now four queries underway, and four are on hold.

Step 3: Some of the queries of chunk 1 have returned, and two faces are invisible. We mark the chunks as occluded, and we don't load them or issue any further queries for them.

Step 4: More data comes back, but in the meantime the player turns his head to the right. The face on the right enters the screen, and the query becomes active.

Step 5: The face is visible, and chunk 2 is loaded and drawn. Three new queries are sent to the GPU, including the one for the face that borders the occluded chunk above it.

Step 6: That face is visible, so chunk 3 is now marked visible as well. The other two faces were occluded.


Coming up

It appears there is still a problem where I get false negatives back. This is annoying, because the terrain shows a hole at that point. Here's a screenshot with one of such hiccups. The queries are drawn in green, for debugging purposes.



Usually a shake of the mouse solves it, and the algorithm chugs along as if nothing happened, but it's really annoying and ugly.  Also, I need to integrate the heightmap, so I can skip air chunks. (As you can see, there's a lot of unnecessary checking going on up in the sky. This will speed things up tremendously.)

Once that's taken care of, I hope to be able to post some videos showing all this in action, and also some statistics of just how much memory and triangles this saves under realistic conditions.

Tuesday, May 24, 2011

Scripting

Luabind is awesome.

I've spent some time with this library, and it's a really amazing C++ wrapper for Lua. Binding functions, classes, even callbacks and anonymous functions, it's a breeze.

Materials were never supposed to be hardcoded, but now the framework finally allows you to specify your own in a Lua script:

define_material(1, { name = "dirt", texture = { 1 }, hardness = 1.5 })
define_material(2, { name = "grass", texture =  { 1, 0, 1 } })
define_material(3, { name = "water", texture =  { 239 }, transparency = 0.5, viscosity = 0.2 })

...And so on. My first idea was to read this from the world database, or an XML file, but nothing is as convenient and flexible as this.

(The "texture" property is an array with 1 to 6 indices to the texture atlas. If only one value is given, all faces of the cube will use that texture. In the case of grass, texture '1' is used for the sides, '0' for the top, and '1' again for the bottom. An array with 6 values would specify different textures for every face.)

Strictly speaking, this isn't real scripting yet, just configuration. So let's take a look at events:

function launch (player)
    player:change_speed(0,0,10)
    player:shout("WHEE!")
end

on_approach(40, 40, 0, 3, 6, launch)

On_approach calls a function whenever a player gets near (40, 40, 0), within 3 blocks distance. (The coordinates are given relative to the center of the world). Any player entering this area will be launched into the air, by adding 10 m/s to their vertical speed. This event won't fire again until the player is further away than 6 blocks.

Similar triggers could be added to materials, items, and mobs. This, for example...

define_material(2, { on_touch = launch })

would be really, really annoying. But it would not overwrite anything from the first definition, it would only add a trigger.

Next thing I'll need to do is look at security. It's very important that the scripts are properly sandboxed. Also, I should study Gary's Mod, to get an idea of how to properly design such an API.


Edit: I'm jotting down some ideas here.