Skip to content

The Development Journey: From Concept to a Learning Machine

Every project is a story of learning, facing challenges, and finding solutions. The development of "Train Your Foes" was a significant journey that began with foundational learning in multiple domains and ended with a successful, custom-built AI implementation. This document chronicles that entire process.

Phase 1: Forging the Foundation

Before a single line of game code was written, a significant amount of time was dedicated to acquiring the necessary theoretical and practical knowledge in three key areas: Deep Learning, Reinforcement Learning, and Game Development.

1.1. Understanding the 'Brain': Deep Learning

The journey began at the core of modern AI: understanding how neural networks function.

  • Resource: Deep Learning Specialization by Andrew Ng (Coursera)
  • Key Learnings: This course provided a critical, low-level understanding of the mathematics and architecture behind neural networks. We explored concepts like forward and backward propagation, activation functions (Sigmoid, ReLU), loss functions, and gradient descent. This foundational knowledge was essential for demystifying what happens inside a complex AI model.

1.2. Teaching the 'Brain': Reinforcement Learning

With an understanding of neural networks, the focus shifted to the specific discipline of Reinforcement Learning (RL), which is centered on training an agent to make optimal decisions.

  • Resource: Reinforcement Learning series by Sentdex (YouTube)
  • Key Learnings: Sentdex's practical, code-first approach was invaluable for understanding core RL concepts. We learned the fundamentals of the agent-environment loop, the critical role of the reward function, and the mechanics of a Q-Table. This is where we first grasped the importance of the key hyperparameters:

    • Alpha (α): The learning rate.
    • Gamma (γ): The discount factor for future rewards.
    • Epsilon (ε): The exploration vs. exploitation rate.
  • Resource: Farama Foundation's Gymnasium Documentation

  • Key Learnings: To apply RL theory, we needed a sandbox. Gymnasium (the successor to OpenAI Gym) provided a standard toolkit of environments to test algorithms. It was here that we first saw RL models interacting with a simulated environment in real-time.

1.3. Building the 'World': Game Development with Unity & C

Finally, to create the environment for our AI, we needed to master the tools of game development.


Phase 2: Exploratory Mini-Projects

With a solid theoretical foundation, we undertook five distinct mini-projects to test our skills in a practical setting before tackling the main project.

2.1. Mini-Project 1: Falling Blocks Game

  • Objective: To master the absolute fundamentals of the Unity Engine and C# scripting by creating a simple, complete game from scratch.
  • Implementation: A "falling blocks" style game was developed, which required handling player input, manipulating GameObjects in real-time, implementing basic physics, and managing a simple game state (like a scoring system).
  • Key Challenge & Outcome: This project was instrumental in building confidence with the Unity editor and workflow. It provided a hands-on understanding of the core loop: processing input, updating game logic, and rendering the result. It was the essential first step in practical game development.

2.2. Mini-Project 2: Pixelated Game with Kenney Assets

  • Objective: To build a more visually complete and polished game prototype, focusing on art integration and creating a cohesive player experience.
  • Implementation: Leveraging the popular "Kenney" pixel art assets, we built a more complex 2D game. This involved working with sprite sheets, animations, tilemaps for level design, and more sophisticated scripting.
  • Key Challenge & Outcome: This project taught us how to build a game with an intended aesthetic, moving beyond simple programmer art. It emphasized the importance of a well-organized asset pipeline and the process of building a full game loop, from a main menu to a playable level and a game-over state.

2.3. Mini-Project 3: Neural Network from Scratch in Numpy

  • Objective: To prove our understanding of deep learning theory by building a neural network from the ground up, without the help of a major library like Numpy.
  • Implementation: The network was trained on a dataset of footballer stats to predict their in-game rating.
  • Key Challenge & Outcome: A major roadblock occurred when the network began outputting the same prediction for every input. This is a classic sign of dying neurons or vanishing gradients. After extensive debugging, we discovered that careful initialization of weights and biases was the solution. This project was a powerful lesson in how sensitive neural networks can be and solidified our low-level understanding of their mechanics.

2.4. Mini-Project 4: The Mountain Car Problem

  • Objective: To implement the Q-Learning algorithm to solve a classic Reinforcement Learning challenge.
  • Implementation: Using Python and the Gymnasium library, we trained an agent to solve the "Mountain Car" environment, where a car must learn to rock back and forth to build momentum to climb a hill.
  • Key Challenge & Outcome: This project was a masterclass in the importance of the exploration-exploitation trade-off. By starting with a high epsilon (encouraging random actions) and slowly decaying it, we watched the agent go from random, useless movements to a confident, optimal solution. It was the first time we saw a Q-Table truly "learn" an effective strategy from scratch.

2.5. Mini-Project 5: Jumping Car with ML-Agents

  • Objective: To familiarize ourselves with the Unity ML-Agents toolkit, which was our intended framework for the main project.
  • Implementation: We created a simple 3D environment where a car agent had to learn to jump over obstacles.
  • Key Challenge & Outcome: This project was a resounding success. The ML-Agents toolkit worked perfectly out of the box for this simple task. The agent learned quickly, and the integration with the Unity Editor was seamless. This success gave us the confidence (perhaps a bit too much) that ML-Agents was the definitive solution for our final, more ambitious project.

Phase 3: The Main Project - A Story of Failure and Discovery

Armed with foundational knowledge and successful mini-projects, we embarked on the main "Train Your Foes" project. Our initial vision was a complex, real-time hack-and-slash game where the boss AI would learn to counter the player's moves. We chose Unity ML-Agents as our tool.

This is where disaster struck.

3.1. The Great Failure of ML-Agents

Almost immediately, we hit a wall. Our attempt to use ML-Agents for the hack-and-slash game failed for several critical reasons:

  1. Version Hell: The first sign of trouble was the extreme difficulty in setting up the environment. ML-Agents is notoriously sensitive to version compatibility between the Unity Editor, the C# package, and the Python trainer. We spent countless hours wrestling with obscure errors, dependency conflicts, and deprecated functions.
  2. The Learning Black Box: When we finally got it running, the agent simply wouldn't learn. It would get stuck in a simple loop, repeating the same action endlessly. The complexity of a real-time game introduced a massive state space (player position, velocity, attack state, i-frames, etc.), and the reward signal was too sparse and noisy for the agent to make any meaningful connections.
  3. The Simulation Bottleneck: We realized that even if we could fix the learning, training would be impossible. RL requires millions of trials. Rendering a full 3D game for each trial is incredibly slow. The alternative—creating a "headless" simulation in a separate script—would mean re-coding our entire game's physics and logic from scratch, a project in itself.

3.2. The 'Aha!' Moment and The Pivot

After a week of failed attempts, the project was at a dead end. In a moment of frustration, we decided to try something radical. We took the simple, custom Q-Learning script we had written for the Mountain Car project and applied it to a simplified version of our game.

Suddenly, it worked.

Even though it was basic, the agent started to learn. The key difference was simplicity. The Q-Learning script, with its 2D table, was far less complex than the deep neural networks of ML-Agents.

This breakthrough coincided with a crucial piece of advice from a mentor, who had suggested from the very beginning that a turn-based RPG would be a much better fit for our learning goals. We had initially dismissed the idea in favor of the more ambitious hack-and-slash, but now the wisdom was clear.

A turn-based game had two massive advantages: 1. Discrete State Space: The game state could be perfectly defined by a few variables (Player HP, Boss HP, Player Energy, etc.), making it ideal for a Q-Table. 2. Instant Simulation: The game's logic was simple enough to be simulated in a headless C# script without any graphics. We could run tens of thousands of battles in minutes, not days.

This was our path forward. We. We abandoned ML-Agents and the hack-and-slash concept and went all-in on a turn-based RPG with our own custom Q-Learning engine. This pivot from a complex, failing tool to a simpler, foundational approach that we deeply understood was the single most important decision in the project's development.