The story of the non deterministic Replay

Posted in Gemserk and tagged development, infrastructure, lockstep, networking, synchronized simulation, techniques, tools on Sep 26, 2016

This is the story of how I discovered my simplified replay system wasn’t so deterministic as I believed because I had an ugly bug, but read the post if you want to know where exactly.

While integrating the lockstep engine on the game I am working on, I decided to do something to save and load replays to be able to easily reproduce some bugs I was experimenting. After I had that done and working, it was pretty awesome to see I can replay the same game multiple times (food for another blog post by the way), however, I thought it could be fun and easy to play them faster, why not.

Since I have a fixed time step logic, it should should be pretty straightforward, simply use a multiplied time and then the fixed timestep logic would do all the work and the game logic shouldn’t notice the change. I decided to give it a try and it worked…. almost, when playing the replays at higher speeds I noticed some visual differences but I wasn’t totally sure (it could be interpolation code).

To verify, I went back to the test project, where I had the moving box, and test it there, but I needed some way to be sure. Since I have already a way to calculate checksums of the game state, I used that to verify the game states when playing replays at different speeds (from 2x to 16x).

It failed, even though it only failed to validate some frames, following frames were not necessarily invalid (this is something important to consider).

Image 1: It shows one of the best tools in the world to check game states when replaying a game.

So, I was right, I saw some differences, I could be sure that something was happening. The thing was, with only the checksums I couldn’t know what the real difference was. Next step, making something to detect it.

In order to do that, I had to change to start saving (at least for debug) the game state, not only the checksum, and to check differences between stored game states in the replay and the current game state (when replaying the game) when checksum validation fails. It worked too, now I have the exact place where the differences are.

Image 2: It shows why serializing all the game state in one string is the best thing to do in your life.

After testing it a bit, I noticed another curious thing, the game validation wasn’t always failing given the same replay and the same speed. That gave me a hint that the problem was probably not related with the game code itself (the moving box).

So, if I played the replay at 1x, it was validated properly. If I played the replay at 8x, it failed, most of the time, but not always. So, it seems there is something related with speed I don’t understand yet.

I decided to test the same replay but with Unity timescale modified, my first test was using 1x for replay but 5x for the timescale, validation failed, then the opposite, 10x for replay but 0.1x for timescale, and it worked well. So the problem seems to be related with my accumulator logic inside the fixed timestep logic?

Some test cycles later, it turns out that, it was indeed a bug in one of the core classes of the engine!

The problem was on my class LockstepFixedUpdate, the first version was overriding the Update() method and performing lockstep logic, it worked ok if only at most one fixed update is processed, but in the case a big delta time arrives, it only process lockstep logic once for the first fixed update and never again.

That means that, in case replay commands were to be processed in frame 3, we are in frame 1 and a big dt of 10 frames arrives, then lockstep logic checks only in frame one and never again until all 10 frames were processed. This bug even bypass the lockstep!

Since I made a test to replicate the bug, it was really easy to fix it, I changed to process the lockstep logic with each fixed step updates and it works fine now, I have high speed replays!! YEAH!!

Conclusion

In the process of finding this bug I started to expand the engine support and create better tools, this is really important if I want to build something solid over it.

The only way to detect issues as soon as possible is to iterate over the engine as soon as possible and to do that, use cases are needed and games provide the best use cases. In my case, I am using not only the game I am trying to make but also other similar games as references when deciding what I want and how I want to test it, for example, having replays, being able to play the replay at different speeds, being able to save the replays, etc. Also, being able to replicate a bug in a small test case where you can iterate quickly to fix it is super useful.

Detecting (and having) problems like this in a small and simple game gives the idea of the complexity of a medium to big game, all the variables and the difficulties, it is not something to underestimate, so when developers say they couldn’t add multiplayer features to their game because it was really hard to do it, it is not a lie.

I love all of this stuff, even though I understand it is not an easy path.

To complete this post, here is a video showing a prototype of how I load and play a replay which was created by playing with two players in LAN, one was my computer and the other was my phone:

The quote of the day is ‘Fail as much as possible, as soon as possible to avoid failing when it is too late’.

Hope you enjoyed the journey.

What to consider as game state when validating a simulated synchronization.

Posted in Gemserk and tagged introduction, lockstep, networking on Sep 19, 2016

One really important thing when performing a synchronized simulation is having a way to check if all clients have the same simulation state. In order to do that, it must be clear which parts of the game state are important for the synchronized simulation and which parts aren’t. Obviously, this depends a lot on the game.

The idea of this blog post is to talk briefly about what is normally an important value for a game, and what isn’t.

What is not important for the game state

Let’s talk first about what is not important for your game. Even though it depends on the game, there are some common things that shouldn’t be considered as part of the important game state.

These things are normally all the stuff that doesn’t affect the game outcome, in other words, all the stuff that could be removed from the game and the game will be the same. For example, the smoke particles coming out from a bonfire, or even the current playtime of fire sound effect.

Also, things that could be derived from important gamestate even though they could affect the outcome in some way. For example, the units health bar current fill percentage could be derived from the unit health, that value could be used by the game logic but since it is directly derived from other values, it is not considered important.

What is important for the game state

All the things that could affect directly the game outcome are important for the game state. If you have a way to restore the game from a saved state, you have to be able to finish the game the same way each time you restore it.

For example, in a game like Doom, if you save and load you have to have the same health, armor, weapons and ammo, you should be in the same world position you were and enemies too, with same health, etc.

In Starcraft, what is important is your current vespene gas and your minerals since that decide what you can build or not, your current units with their health, your structures and their queued orders, the revealed map, etc. Things that are not so important include the blood in the ground when your units die (if you are playing Terran of course), your current units selection, the exact frame of a unit animation (unless the game depends on that to fire a skill).

If you are just saving the game state to be restored locally, you may want to store more stuff that you need since that shouldn’t matter. However, in the case of checking state synchronization between machines, having stuff that could differ and doesn’t affect the game outcome is not a good thing to have since you are not only wasting bandwidth but also risking the synchronization check.

Next time

Currently I am working on a way to represent the game state of a game I am working on and I want to have both features, a way to check synchronization between machines as well as having a way to save and restore the game state. I hope to talk about the benefits and problems of those features in the next posts, as well as having a more detailed information on the concepts of this blog post too.

Even though this blog post was a bit basic, I hope you liked it. I just wanted to talk a bit about the challenges I am dealing with right now.

Understanding Unity Networking (Unet)

Posted in Gemserk and tagged introduction, networking, unity on Sep 12, 2016

The objective of this blog post is to give you a first look at Unity Networking (Unet) based on my experience with it, and from my point of view.

Introduction

Unet is the Unity client/server architecture solution for networking, which provides the developer a wide range of options from high level logic like automatically synchronizing game objects state to low level like directly sending bytes through a network connection.

To have a quick understanding of how it normally works, here is a quick example. Imagine there are 3 machines connected, one server and two clients. Each machine is running the same game, that means the same code and the same Scene with same GameObjects. The code has a way to know if it is on a server or a client, and if has authority or not, and take action according to that. For example, for non player objects the server is the one with the authority and for player objects only one client has authority. Since it is a client/server architecture actions are processed on the server and then synchronized back to the clients.

note: I wrote normally because it is not always the case, you can have have objects created only on some clients, or on the server, etc.

Unet

The idea here is to give you a quick look of what I explored so far.

The NetworkManager is Unet point of entrance. It is the way to host or join a game. It is responsible for creating the player object for each client, etc. As a Unity behaviour, it allows configuring the player prefab, start positions, different scenes for being offline or online, or even advanced stuff like configuring connection timeouts, etc.

All Unet network objects must have a NetworkIdentity behaviour, this is the way to identify an object is the same in different machines, used for synchronzation for example.

The NetworkBehaviour is the base class for behaviours that want to know about networking. Subclasses of this class can override easy methods to be aware of networking events, automatically synchronize fields, invoke methods on the server or on the clients, and to know current code has authority over the object or not (since normally all objects are created in each machine). Here is a code sample containing some of these concepts:

public class NetworkExampleClass : NetworkBehaviour
{
  [SyncVar]
  public int health;

  int maxHealth = 100;

  public Color myColor;

  public GameObject bombPrefab;

  public void Start()
  {
    if (isLocalPlayer) {
      myColor = Color.blue;
    } else {
      myColor = Color.red;
    }

    if (isServer) {
      Debug.Log ("Object started on server");
    }
  }

  // only want to process this Unity callback on clients (avoid server update)
  [ClientCallback]
  public void Update()
  {
    // don't want to process on the client that is not the object owner, this works only
    // for the player object I believe.
    if (!isLocalPlayer)
      return;

    if (Input.GetMouseButtonUp (0)) {
      CmdTakePotion(10);
    }

    if (Input.GetMouseButtonUp (1)) {
      CmdSpawnBomb(UnityEngine.Random.insideUnitCircle * 100);
    }
  }

  // send command to server to perform authoritative logic
  [Command]
  public void CmdTakePotion(int hitpoints)
  {
    health += hitpoints;
    if (health > maxHealth)
        health = maxHealth;
    RpcShowPotionFeedback (hitpoints);
  }

  // send command to clients to perform visual logic
  [ClientRpc]
  public void RpcShowPotionFeedback(int hitpoints)
  {
    CreatePotionFeedback ();
  }

  public void CreatePotionFeedback(int hitpoints)
  {
    // TODO: create visual feedback for taking a potion
  }

  [Command]
  public void CmdSpawnBomb(Vector2 position)
  {
    GameObject bombObject = GameObject.Instantiate (bombPrefab);
    bombObject.transform.position = position;

    // this spawns the object on all clients and syncrhonizes them with server object (identity, etc)
    NetworkServer.Spawn (bombObject);
  }

}

(note: this code may not compile, it wasn’t tested)

And here is a quick explanation of the example concepts:

isServer: this property lets you know if you are on the server or not.
hasAuthority: this property lets you know if you have authority over the game object or not.
isLocalPlayer: this property tells you if the gameobject represents the player on the local machine, to avoid for example moving another players gameobjects when processing the input.
[SyncVar]: this field annotation it is a way to specify you need a field to be synchronized over the same game object running in each machine (the object identified by the same network id). For example, the health was decreased from 10 to 5 in the server, then all clients must show the change in the health bar.
[Command]: this method annotation it is used on methods that want to be invoked from a client to run in the server, their names must start with Cmd prefix. For example, if you want to do damage over a unit you send that to the server since the server is the authority here.
[ClientRpc]: this method annotation it is used on methods that want to be invoked from a server to run in all clients, their names must start with Rpc prefix. For example, the server detected some unit died (after it received damage), then it invokes a RpcUnitDeath(), so each client show the unit death animation.
Other methods annotations like [ClientCallback] and [ServerCallback] are an easier way to not having to check isServer property in the code, it is used for Unity automatic gameobject callbacks like Update or Start methods.
NetworkServer.Spawn(): it is the way to synchronize server creation of objects on clients, that means creating a game object on each client with the same network id.

For automatically synchronizing state there are already some behaviours like the NetworkTransform. These behaviours not only provide an automatic way to synchronize a GameObject common state but, in some cases, also a way to configure interpolation strategies. Here is a picture showing the NetworkTransform configuration:

Note: in my tests I couldn’t find a way to make the transform interpolation to look smooth. Even looking at other developers examples, they normally create their own interpolation code to fix that.

Here is a video showing a quick example I did using Unet, there are two clients, the left one is also the server. Each player can control one hexagon by clicking where to move it, the movement logic is performed in each client machine and automatically synchronized using a NetworkTransform.

Conclusion

The Unet high level concepts are a way to get, as fast as possible, a multiplayer game working, and depending the kind of game you are making, it may fit perfectly your needs. The low level access may give you a real control of what is happening behind the scenes in case you need to optimize your game networking usage.

As a developer, being flexible to use from high to low level stuff is always an advantage, and that is a really good point of Unet. Also, even though it wasn’t so easy to understand for me in my first tests, I believe the HLAPI is not hard to use and gives a good start point, and, like other Unity stuff, it is really good for quick prototyping.

Another really good point is that it is open source which means I can take a look at implementation details if I want, or understand how something work behind the scenes, or even propose bug fixes.

The bad part is that the documentation is not good, but that happens for all the Unity stuff, when you need to understand something in detail the documentation says nothing or it is even wrong. Also, the first Unet tutorial results wasn’t what I expected, the movement was really choppy and it didn’t work so well (I believe they updated it, I tried the one for Unity 5.3.x), that was something that made me think about Unet not being the solution I wanted for my games, but in the end it was just a really bad example of Unet features.

Also, if you are making an action game, I don’t know if Unet provides good support for client side prediction, server reconciliation or lag compensation but it shouldn’t be a problem to implement what you need over Unet since it is very flexible.

Right now I am following another aproach of using low level send messages API since for my game I am trying to reduce the bandwidth cost by sending only player actioons and performing a synchornized simulation, something I will try to write about in a future blog post.

References

Some references in no specific order.

Tutorial videos for making a survival cooperative game

Unet overview

Unity forums - Multiplayer networking

Unity forums - Collection of multiplayer samples

Unity networking source code at Bitbucket

A simple implementation of authoritative movement on top of unet

A basic analysis of Clash Royale multiplayer solution

Posted in Gemserk and tagged analysis, clashroyale, howto on Sep 5, 2016

The idea of this post is to analyze in a superficial way the multiplayer solution behind Clash Royale. Before starting the analysis, you must know that I don’t have previous experience making multiplayer games, I am just learning, and all the analysis done here is just a theory.

So, you surely know about Clash Royale but in case you don’t, it is an online multiplayer 1v1 RTS game where the first player to destroy the other’s towers win. To do so, they play cards which transforms into units that advance and attack enemy towers, structures that spawn units or attack enemy units or special powers which can be used to perform damage, among other things. Cards cost energy that regenerates over time (to a maximum). I recommend it since it is a great game, and it is really really polished in every detail (including multiplayer).

Here is a video explaining and showing the game:

Analysis

Since the game is very competitive, they probably have an authoritative server to validate player actions to avoid cheating. For example, a Player could say “I played card X” to the server, the server has to validate the player had that card in hand to use it, and enough energy.

In each game there could be several units at the same time (like 30 units in the worst case), also, there are tons of games being played at the same time. In order to make the game run over mobile networks and to support all those games, with all those units at the same time, they have to reduce the bandwidth to the minimum.

One strategy could be to compress the data sent, other could be to send data not so frequent and to interpolate to be as smooth as possible. However, considering that each unit has a position, looking direction, target, health, animation frame, among other stuff, that could still be a lot of data. I am guessing here that they follow another approach like a synchronized simulation of the game in each client and, since it is not so CPU heavy, every mobile device nowadays (game was released March 2016) shouldn’t have problems running the game logic.

Another thing that made me think about synchronized simulation was that every player action is not performed instantly, it has a small delay or a cast time. That could be a design choice but I believe it considers the fact that actions must be synchronized between players and having a delay allows them to do that.

If they are simulating in the client, they have to control the simulation to be deterministic or they must have some way to fix the game state if it was desynchronized at some point.

If I remember correctly, they follow an approach that the game never stops, players could be disconnected for a while and then reconnect and continue playing, they just lose part of the game (couldn’t perform actions). That could be used also for player desynchronization. Don’t know the resynchronization strategy but maybe the server sends game state snapshot and the player continues from there.

They even have game replays, so I am guessing that simulating the game in each device could help in reducing the cost of watching a replay even though they could replay it in a server if they want since they probably have tons of servers :).

Conclusion

My guess is that the game has a client/server architecture where both the server and the client simulate the game synchronously. The server is in charge of validating player actions and responsible of deciding the real game state in case of desynchronization.

As I said at the beginning of the blog post, this is a superficial analysis based on my current knowledge. If I wanted to perform a deeper analysis I could have follow another approach like doing some reverse engineering over the game connections to validate some of my guesses but that wasn’t the blog post purpose.

And here is a really fun video of the game to finish the post:

Lockstep multiplayer first steps

Posted in Gemserk and tagged development, diary, infrastructure, lockstep, networking, prototype, synchronized simulation on Aug 30, 2016

In this Gamasutra article, Tundra developers explain their approach to minimize problems when developing a lockstep multiplayer platform used for their game Rapture - World Conquest. Based on that, my first approach was to create and understand a deterministic lockstep logic without even considering networking yet, to focus on one problem at a time.

My objective is to have a reproducible game that given the start state and all the players actions during time, the game can be played again and the final state will be the same.

Lockstep logic

The first step was start by creating a logic to encapsulate the fixed game state for physics and game logic, and a lockstep to perform player actions. As I started without considering networking yet, all the player actions are directly enqueued locally by the user input or by a saved replay.

A lockstep logic means that there are some conditions that, if not met, the game should pause the simulation and wait for them. In the case of multiplayer games this is when the game have to wait for other players actions that didn’t arrive in time (and where the waiting for other players dialog is shown in some games).

My code looks something like this pseudo code:

update(dt) {
  if (lockstepLogic.IsLockstepTurn()) {
    if (!lockstepLogic.IsReady())
      return;
    lockstepLogic.Process();
  }
  // normal accumulator logic for the fixed gamestep
}

Since I create all player actions locally, the lockstep never happens right now but it is a good practice to simulate and test it anyways.

Testing it

My first prototype using this logic is a box that moves over the screen. When the right mouse button is pressed, the game enqueues a player action to move the box to the specified position, then when the lockstep logic is processed the box receives the command and start moving to that position.

Interpolation

The moving logic was pretty simple, based on start and destination positions and a speed, the box moves itself to the destination in a straight line. Since that logic is performed in each fixed gamestep, the player see the box jumping between positions, it works but it doesn’t look so good. To improve that and make the movement smoother I created a simple interpolation code.

Interpolation depends a lot on what you are trying to interpolate, in some cases could be a simple as the code I used there but there are other cases like the bouncing ball which need more data to create a better interpolation.

Note: adding interpolation means we have now a delay of one fixed gamestep between the box position being rendered and the real position in the game since we are using previous and current positions info for the interpolation.

My first replays

By recording all the player actions with the fixed gamestep frame they were executed, I created a basic way to replay the “game” by just reseting the game state to the initial state and start enqueing all the recorded actions in each corresponding frame.

Here is a video of the progress so far:

Yeah, I know, it’s not a great thing, but at least I am starting to understand some of the challenges of making multiplayer games.

Validating simulation

To validate the game is always performing the same simulation given the same input, I added a checksum calculation based on the game state (for now just the moving box state) which is saved from time to time to use later as validation when simulating the game again. The idea was to start defining an API to get the important game state to consider when validating the simulation, and also to start testing game state validation. The code looks like this:

update(frame) {
  if (IsChecksumFrame(frame)) {
    if (recording) {
      checksumRecorder.RecordState(frame, CalculateChecksum());
    } else {
      if (!checksumValidator.IsValid(frame, 
          checksumRecorder.SavedChecksums, CalculateChecksum())) {
        throw Exception("current game state is not valid!");
      }
    }
  }
}

In my first tests I wasn’t reseting the game to the initial state properly so my game was producing different checksums when reproducing the saved player actions, checksums validation started to work after that was fixed.

How am I calculating the Checksum? Right now I am not sure exactly what algorithm to use nor which game state should I consider for the checksum, so what I did was to encapsulate that in some interfaces and implemented a basic way to get both. For the checksum I am using a simple MD5 over a string, and for the game state, well…, a string with all important values concatenated (the moving box position, destination, if it is moving or not, etc).

string CalculateChecksum() {
  string gameStateValue = game.GetGameState();
  return MD5.CalculateHash(gameStateValue);
}

string GetGameState() {
  string gameState = "";
  foreach (object in gameObjects)
      gameState += object.GetGameState();
  return gameState;
}

For a future implementation what I know is that the game state should be composed with important values that can affect the simulation, so I shouldn’t care about about audiovisual stuff like particles, effects and sounds.

Also, the game state concept could be used to reset to the initial game state or even for saved game states (to easily replicate some bug for example), and finally, it could be used to synchronize and validate state if for some reason I end up using a client/server architecture.

Next frontier: Determinism?

For now I didn’t explore determinism realm because the solution really depends on the game logic but at the same time I must have it clear before starting the game code. One of the next steps is probably start testing with fixed point math, not sure yet, the idea is try to follow an approach similar to the gamasutra article’s of reducing non determinism problem to the minimum before going multiplayer.

If you want to take a look all the code used for this blog post, here is the link.

Gemserk

Game Development Company

The story of the non deterministic Replay

Conclusion

What to consider as game state when validating a simulated synchronization.

What is not important for the game state

What is important for the game state

Next time

Understanding Unity Networking (Unet)

Introduction

Unet

Conclusion

References

A basic analysis of Clash Royale multiplayer solution

Analysis

Conclusion

Lockstep multiplayer first steps

Lockstep logic

Testing it

Interpolation

My first replays

Validating simulation

Next frontier: Determinism?

Other links

Developers

Recent Posts

Tags

Archives