Creating a voxel OpenGL game engine

For the final project of our graphics class, I worked in a team of three to build a "mini" version of Minecraft. I handled chunking, rendering, gameplay systems, and UI.

The sun is rising on a hill. Trees with colored lights are in the distance. Clouds are in the sky. The player has glowstone selected in their hotbar.

First, let’s take a look at this snazzy showcase reel I made (it’s less than 3 minutes).

It does a nice job of demonstrating our engine’s features and gives credits where due. While we all contributed code, I focused primarily on the following features:

I wanna talk a bit about each, some implementation details, and any other interesting details.

Chunking

Just like Minecraft, we group the world terrain into 16x16x256 collections of blocks. This chunk data structure is really just a 1D array that stores the block type for a given coordinate. We have helpers that convert a (x, y, z) coordinate to an array index, and vice versa.

enum Block {
  EMPTY, STONE, DIRT, OAK_LOG, ETC
};

class Chunk
{
private:
  std::array<Block, 16 * 16 * 256> blocks;
};

A Block enum declares all possible block types in our game. The EMPTY block represents air, or the absence of a block. There are some benefits to this. For instance, breaking a block is the same as placing an EMPTY block!

Conceptually, our terrain should “own” these chunks, so they all live in a Terrain structure. For easy access, we represent a chunk by its lower-left xz-coordinate, and use smart pointers for safety.

class Terrain
{
private:
  std::unordered_map<glm::ivec2, std::unique_ptr<Chunk>> chunks;
};

Efficient rendering + face culling

Chunking our terrain helps a lot when we draw our blocks. For instance, my teammate implemented multithreading based on my chunks, which are easier to reason about than individual blocks. We render the terrain whole chunks at a time.

A block has six faces. Each face uses two triangles. If we render every face of every non-EMPTY block of every chunk, every single frame, our GPU would catch on fire and explode. There’s no point in rendering faces that are hidden by another block. The player never sees it.

Two blocks are right next to each other. The face in between does not need to be rendered.

Instead, we first iterate over every block in the chunk. For each non-EMPTY block, we check its six neighbors, and only render a corresponding face if its neighbor is EMPTY.

In the Chunk class, we store pointers to its neighbors. This way, we can check the neighbor of a block that may reside in a different chunk.

enum Direction {
  X_POS, X_NEG, Z_POS, Z_NEG, Y_POS, Y_NEG
};

// this really only makes sense for X_POS, X_NEG, Z_POS, and Z_NEG, but
// Y_POS and Y_NEG will be used later for other purposes.
std::unordered_map<Direction, Chunk*> neighbors;

If you were to noclip below the ground, it now looks “hollowed out.” What you’re seeing below is what our caves look like… on the outside! The player is technically inside solid STONE right now, but it’s just not being drawn.

Since we're not rendering any blocks inside the terrain, we can see snake-like caves underground.

I cannot emphasize how much this improves performance. On my laptop, my framerate basically tripled. This was a crucial step in getting our game to run well.

Texturing and block animations

I render chunks using indexed rendering and store all vertex attributes in a single interleaved VBO. This includes UVs and an animation flag that determines whether a block texture should move.

struct VertexAttrs
{
  glm::vec4 position;
  glm::vec4 normal;
  glm::vec2 uv;
  GLfloat animated;
}

// corresponds to the above and below
int stride = 4 + 4 + 2 + 1;

The stride parameter in glVertexAttribPointer was difficult to wrap my head around. All it does is tell the GPU when the data for the next vertex starts. To the GPU, our VBO is just a really, really long sequence of floats. So, we have to specify the size of each attribute.

In a stream of floats, we specify the first four to be for position, the next four to be for normals, and so on.

For example, position takes up four floats, while anim only uses one.

Block texture atlas

All the block textures in our game live in a 256x256 PNG sprite. Each block face texture is 16x16 pixels, so we can refer to specific textures using a coordinate system. For example, the dirt texture’s lower-left corner is located at (2, 1) on our imaginary grid.

A dirt face texture at coordinate (2, 1) corresponds to (2/16, 1/16) in UV coordinates.

We map a block type to the imaginary coordinate for each of its six faces.

std::unordered_map<Block, std::unordered_map<Direction, glm::ivec2>> block_uv_map;

// for example, this accesses the UVs for the Y_NEG face of a dirt block.
// this is where the enums really come in handy!
glm::ivec uv = block_uv_map[DIRT][Y_NEG];

Face textures are 1/16 of the whole sprite, so we multiply the coordinate by 1/16 to get the correct UV. Note that this only describes the lower-left corner of the block. We need to add 1/16 horizontally and vertically to get the other three corners.

In our engine, water and lava blocks are seemingly “flowing.” This effect was achieved by slowly translating over the whole texture sprite, moving the UV coordinates over time. The animation flag turns this on and off.

Handling transparent blocks

At first, transparent blocks like water and glass were overwriting opaque blocks from neighbor chunks. To fix this, we actually need to keep two separate VBOs and index buffers for opaque and transparent blocks.

We first draw all opaque blocks to screen, and then all transparent blocks. This is done through our Terrain.

void Terrain::render_terrain() {
  // get the list of chunks we actually want to render
  std::vector<Chunk*> chunk_list = get_chunks(player->position);

  // first render opaque blocks
  for (const Chunk* c : chunk_list) {
    c->render_opaque();
  }

  // then render transparent blocks
  for (const Chunk* c : chunk_list) {
    c->render_transparent();
  }
}

This solves our issue because transparent blocks now have the correct fragment color to overlay on top of.

Distance fog

If we fly really high up into the air, you’ll see a circular fog that surrounds the player and slowly fades blocks into the distance.

Blocks slowly fade into the distance in a circular radius around the player.

In the chunk fragment shader, I obtain the player’s world position and the fragment’s world position. I find the distance between these two, and divide by a max_distance to get a value between 0.0-1.0.

This value is used to lerp between the original block color and the sampled sky color. I use a nice cubic easing curve to make the fog get denser the further the block is from the player.

So, what is max_distance? At all times, we only render a specific number of chunks around the player: a 12x12 group of chunks, to be exact. The player is always at the center of this group, so the max distance (the circle’s radius) is always at most 96 blocks.

Diagram depicting the fog starting roughly around 96 blocks around the player in all directions.

So, the distance fog plays a second role: to stop the player from seeing chunks appear and disappear as they move. But, it’s worth it to implement for aesthetics alone, I think:

A beautiful sunrise over the desert.

Day/night cycle, celestial objects

Our engine features a day cycle. We see the sun during the day, and the moon and stars at night. Clouds are always visible. In between, sunsets and sunrises transition us nicely.

A diagram depicting what percentage of a day we spend in day, in night, and how long it takes to transition.

During development, I religiously followed the diagram above. It defines a single day as a range from 0.0-1.0, and specifies exact durations for transitions. For example:

The relative 0.0-1.0 scale allowed us to constantly change the actual length of the cycle (versus a hardcoded number), making debugging infinitely easier.

Skybox

The skybox is not a cubemap. We’re actually drawing to a screen space quad. To determine the color of each pixel, we cast a ray from the player camera and see what it intersects.

Depending on the direction of the ray, we can decide different colors to give to every single pixel. The picture below, for example, maps the literal ray direction to RGB values.

The sky is split into green, red, and black depending on the direction of the ray.

There’s a lot we can do here. For example, map a position to a UV coordinate, and output a different color depending on the value. Or, shooting a ray out and coloring the radius around that ray a different color, up to a certain angle (like a cone). I also lerp between colors depending on the time of day.

Sun, moon, and stars

Both the sun and moon are quads that rotate at a fixed distance around the player. This allows them to appear “infinitely” far away.

After drawing the sun and moon but before drawing the terrain, I reset the OpenGL depth buffer. This means that they are always drawn behind blocks, no matter the player’s y-coordinate.

// this function is called every time our scene is drawn, which is 60 times
// a second in our engine.
void paint() {
  /* previous rendering stuff */

  sun.render();
  moon.render();

  // reset depth buffer so all celestial objects are drawn behind terrain
  glClear(GL_DEPTH_BUFFER_BIT);

  // keep on renderin'
  terrain.render_terrain();
}

Each individual star is a tiny quad that I rotate and scale by a random amount. Originally, the stars also rotated around the player, but that looked too unrealistic when the player climbed things. Instead, it only follows the player’s xz-coordinates, but rotates at a fixed y-height.

Clouds

A giant quad floats above the player at all times. On it, I sample a portion of a large texture that contains all the possible clouds in the world:

Cloud texture is opened in Photoshop.

This was created in Photoshop. I used a noise function as the base, reduced the number of colors, and deleted the rest to make somewhat realistic-looking swirlies.

When I send the texture to the GPU, I set these options so that the texture repeats itself for UV values outside the normal 0.0-1.0:

glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_REPEAT);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_REPEAT);

This way, I can offset the UV coordinates based on the player’s current world position. When the player moves forward, the UV coordinates translate “back,” which gives the clouds the appearance of having been left behind. Take a look at jdh’s implementation to get an idea of the math.

Overall the sky system is pretty rudimentary, but with some good skybox colors, animation times, and tweaking, it feels very cohesive and looks very similar to Minecraft’s.

Flood fill lighting

This section could get its own entire write-up, but there are many articles on the internet that do a much better job explaining than I ever could. I’ll link some here:

In our engine, I store a 1D array in each chunk that tracks each block’s light level. Just like Minecraft, our light scale ranges from 0-15.

// using integers are overkill for our purposes and take up more space
std::array<unsigned char, 16 * 16 * 256> light_level;

When generating a chunk’s block data, I track all the naturally occurring light sources, like lava. After we finish, I run the flood fill algorithm on each source, starting with a light value of 15. It’s important to do this after we populate the blocks array so that the algorithm doesn’t propagate through opaque blocks.

We add a new vertex attribute that tracks the light level. In our chunk fragment shader, we multiply our default block color by this additional brightness. Make sure to clamp it between 0.0-1.0 so it doesn’t overpower.

Note that each block face samples the light level from its neighbor block. Depending on the light location, this also means each block face can be affected by a different light value! That’s what makes this system feel more dynamic and lively, I feel.

Inventory

The inventory system is stupidly simple. Here it is in all its glory:

struct Inventory
{
  std::array<Block, 9> slots;
  int active;
};

Pressing numkeys 1-9 (or scrolling with mouse wheel) sets active accordingly. Whenever we want to use the currently selected block, we index into slots using active. This is nice because we don’t have to keep track of what Block is selected, that is abstracted away for us.

GUI

The UI took a bit more planning. Every UI element inherits from a base class that stores our screen’s width, height, and aspect ratio. UI lives in 2D space, so I can use UV-like coordinates to specify each UI element’s position on the screen.

For example, if we wanted to draw a crosshair texture on a square at the center of the screen, I could use the following vertex positions:

class Crosshair : public UI
{
  void set_vbo_data();
};

void Crosshair::set_vbo_data() {
  // upper right, lower right, lower left, and upper left corners
  pos = {glm::vec2(0.51f, 0.51f),
         glm::vec2(0.51f, 0.49f),
         glm::vec2(0.49f, 0.49f),
         glm::vec2(0.49f, 0.51f)};
}

We store the index, position, and UV buffers as std::vector member variables in UI. The base class only has one function, render_ui().

void UI::render_ui() {
  int diff = screen_w - screen_h;
  float margin = diff / 2.f;

  std::vector<glm::vec2> scaled_pos;

  for (const glm::vec2 &p : pos) {
    scaled_pos.push_back(glm::vec2(p.x * screen_w / aspect_ratio + margin, p.y * screen_h));
  }

  // send all of this correctly scaled vertex data to the GPU
}

We’re allowed to define our UI element positions on a relative 0.0-1.0 scale because the UI class takes care of transforming it into actual positions on our current screen resolution. Furthermore, the x-values are offset by a margin that centers everything.

Diagram showing how the UI is scaled up and doesn't matter what screen resolution we are.

I make a number of assumptions about the player’s display:

However, I feel that these are fair tradeoffs (who is playing Minecraft in portrait mode??). Furthermore, this is more than enough for our crosshair, hotbar, and text to scale well and work across all screen resolutions.

The hotbar UI naturally uses the base class as well. To render the currently active slot, I add an additional offset to its x-position depending on what slot number we’re on.

void HotbarActiveSlot::set_vbo_data() {
  // turns out that each slot takes up 7% of the renderable UI space!
  // `active` comes from the current selected slot in our hotbar.
  float offset = active * 0.07f;

  pos = {glm::vec2(0.255 + offset, 0.13),
         glm::vec2(0.255 + offset, 0.06),
         glm::vec2(0.185 + offset, 0.06),
         glm::vec2(0.185 + offset, 0.13)};
}

Text rendering

Picture this. It’s 4 AM and there are only a few hours left before the due date. The engine is done, but falling asleep now would actually feel worse. You decide to go all out on one last thing.

The end result? Very, very basic text rendering.

The font bitmap file is open in Photoshop.

The idea is simple: map a basic set of characters to UVs on a sprite that contains the characters as a bitmap. This is not a new idea—far, far from it—and there are probably thousands of implementations online, but

The question now becomes “how do I fill this character-to-UV map in a smart way?” I mean, I only have to map 78 characters by hand, which wouldn’t have been too bad…

Remember that in C++, char really just represents the ASCII code of the characters. If we order the bitmap characters in ASCII order in our sprite, generating the map on the fly becomes really easy: just increment c to get to the next character! Our glm::ivecs correspond to coordinantes on the font sprite.

std::unordered_map<char, glm::ivec2> char_uv_map;
char c = 'a';

for (int i = 0; i < 26; i++) {
  char_uv_map[c] = glm::ivec2(i, 1);
  c++;
}

From there, I was able to render a single character, and then a string, which is really just many chars at once. My Text class inherits from UI as well to make things easier. I could have extended this system to add information about kerning, offsets, etc. Unfortunately, I fell asleep soon after.

Debug information

I did have time to, however, add an in-game debug mode that shows the player’s current position, as well as the current chunk (press F3 just like Minecraft):

Debug mode is turned on, and shows the current player position as (32, 158, 32) and chunk coordinate as (32, 32).

This helped my teammates easily share cool structures they found for our showcase reel: just take a screenshot! I thought this was a great way to immediately start using my text system.

Conclusion

This project has been very rewarding and is probably the most fun I’ve ever had with a class project. I’ve learned a lot about OpenGL, C++, and rendering in general. I’ve also become very aware of all the heavy lifting that engines like Unity or Unreal do for us.

I’m proud of my UI system, which came together at the very last minute but turned out really well. I also really enjoy how the sky looks. Prettier than Minecraft, I dare say?

Anyways. Thanks for reading this far.