AI Training Process

The AI model described in the previous document is only as intelligent as the data it learns from. A robust and efficient training process was designed to populate the Q-Table with optimal values, transforming the agent from a random actor into a formidable, strategic opponent. This was achieved through a dedicated offline simulation environment.

1. The Training Environment

To train the agent rapidly, a headless simulation was created. This approach was critical for achieving a high volume of training episodes in a short amount of time.

Framework: The simulation was built as a standalone console-based C# project in Visual Studio. It was a single C# file that contained all the necessary battle logic and the QLearning class.
Key Advantage: By completely removing the Unity Engine's overhead (graphics rendering, physics, input handling), the simulation could focus purely on executing the battle logic. This allowed tens of thousands of battles to be simulated in minutes, a task that would have taken days or weeks inside the Unity editor.

2. The Training Opponent: A Scripted Player

An AI learns best when faced with a competent and consistent opponent. For the final training model, the Q-Learning agent was not trained against a purely random player. Instead, it faced a scripted opponent that followed a set of logical, heuristic rules to mimic a strategic player.

The scripted player's policy was as follows: * If Health is high: Always use the standard Attack. * If Special Attack is available: Prioritize using Special Attack for high damage. * If Health is near 1/3 and Heal is available: Use Heal to recover. * If Health is low: Make an unpredictable choice, randomly selecting between Attack and Defend.

This rule-based opponent provided a challenging and consistent benchmark, forcing the Q-Learning agent to discover genuinely effective counter-strategies rather than just learning to defeat a weak, random opponent.

3. The Training Regimen

The training was conducted over a series of extensive runs to allow the Q-Table values to converge towards an optimal policy.

Total Episodes: Various training sessions were performed, with some runs extending up to 100,000 battle episodes.
Final Model Run: The Q-Table used in the final version of the game was the result of a focused 30,000-episode training run.
Validation: After this training, the agent's learning was "frozen" (the Q-Table was no longer updated). It was then tested for an additional 1,000 episodes against the same scripted opponent to validate its performance and ensure the learned policy was stable and effective.

4. Results and Performance Curve

The results of the training clearly demonstrated the agent's ability to learn and master the game. The learning curve was steep and decisive.

Initial Performance (Episodes 1-100): In the very first battles, the agent's actions were essentially random. Its win rate against the scripted opponent was near 0%.
Rapid Improvement (Episodes 100 - 5,000): After a few thousand episodes, the agent began to associate states and actions with positive outcomes. The Q-Table values started to converge, and its win rate rapidly climbed to approximately 90%.
Peak Performance (Episodes 5,000 - 30,000): For the remainder of the training, the agent continued to refine its strategy, optimizing its Q-Table for even the rarest of edge cases. By the end of the run, it achieved a peak and stable win rate of 96-98% in the simulation.

This dramatic increase in performance is a clear indicator of a successful training process and a well-designed reward function.

5. Final Tuning for Gameplay

An AI with a 98% win rate is a testament to the effectiveness of the algorithm, but it makes for an unfairly difficult and frustrating experience for a human player.

The final step in the process was game balancing. The AI's core logic and trained Q-Table were preserved, but other in-game parameters were tweaked slightly to give a human player a better chance of winning. Despite this tuning, the agent's learned strategies remained incredibly effective, still achieving a win rate of over 90% in internal testing, ensuring that the final boss remains a challenging and formidable foe.