Troubleshooting Guide
This document provides solutions to common issues that may arise when working with or attempting to replicate the AI in this project. Reinforcement learning can sometimes be a process of trial and error, and these tips can help debug common failure modes.
Issue 1: The AI Agent Repeats the Same Action Endlessly
Symptom: You notice that the boss AI gets "stuck" in a loop, always choosing the same action (e.g., "Attack") regardless of the game state.
Potential Causes:
- Flawed Reward Function: This is the most likely cause. The reward function might be unintentionally giving a small, consistent positive reward for one action while all other actions receive a reward of zero until the end of the game. The agent will quickly learn that this single action is the only one that guarantees a reward and will exploit it.
- Poorly Tuned Hyperparameters: An extremely low learning rate (
α) combined with a low discount factor (γ) might cause the agent's learning to stagnate before it has a chance to explore the consequences of other actions. - Lack of Initial Exploration: If the Q-Table is initialized to all zeros, and one action happens to produce a positive result first, the agent might never be incentivized to try the other zero-value actions.
Solutions:
- Review the Reward Function: Analyze your intermediate rewards. Ensure that all actions have the potential to contribute to the final outcome. Sometimes, it's better to simplify and use only a strong terminal reward (+1 for a win, -1 for a loss) to avoid misleading the agent.
- Random Initialization: Ensure your Q-Table is initialized with small, random values instead of zeros. This helps break ties at the beginning of training and encourages the agent to try all actions at least once.
- Experiment with Hyperparameters: Try increasing the learning rate (
α) or the discount factor (γ) to encourage the agent to learn faster or value long-term outcomes more.
Issue 2: Indexing Errors (Index Out of Bounds)
Symptom:
The game crashes with an IndexOutOfRangeException when the AI is calculating its state or trying to access the Q-Table.
Potential Causes:
- State Calculation Error: The function that discretizes and encodes the game variables (HP, Mana, etc.) into a single state index is producing a number outside the valid range (i.e., less than 0 or greater than 80). This can happen if a value like HP goes slightly above its maximum due to a bug, changing the result of the calculation.
- Q-Table File Mismatch: The saved
qtable.txtfile being loaded has a different structure than what theQLearningclass expects. For example, it might have too many or too few rows, or an incorrect number of comma-separated values in a row.
Solutions:
- Clamp Your State Variables: In the state calculation function, use
Mathf.Clamp()to ensure that the input variables (like HP and Aura) stay within their expected range (e.g., 0 tomaxHP) before being used in the discretization formula. - Debug the State Index: Add
Debug.Log()statements to your state calculation function to print the intermediate values and the final state index. Run the game and see if the index ever goes out of bounds just before the crash. - Verify the
qtable.txt: Manually inspect your saved Q-Table file. Check that it has exactly 81 lines and that each line has exactly 3 comma-separated numbers. If not, your saving logic may have a bug, or the file may be corrupted.
Issue 3: The AI Fails to Learn (Win Rate Stays Low)
Symptom: After thousands of training episodes, the AI's win rate is not improving significantly and remains around 50% or lower.
Potential Causes:
- Reward Signal is Too Weak or Delayed: If you are only using a terminal reward, the agent may struggle to connect actions taken early in a long battle to the final outcome.
- Opponent Strategy is Too Random: If the scripted player opponent behaves too randomly, the AI might learn a policy that is effective against a random opponent but fails to find a truly optimal strategy.
- Bug in the Update Rule: A small mathematical error in the Q-Learning update formula within your
GetRewardfunction can easily prevent the Q-values from converging correctly, effectively breaking the learning process.
Solutions:
- Implement Reward Shaping: If you are only using terminal rewards, consider adding small, intermediate rewards (as was done in this project) to give the agent more frequent feedback.
- Train Against a Deterministic Opponent: Ensure your scripted opponent follows a consistent, rule-based strategy. This provides a stable environment for the agent to learn in.
- Verify the Bellman Equation: Meticulously double-check your Q-value update logic against the standard Bellman equation:
Q(s,a) = Q(s,a) + alpha * (reward + gamma * max_q' - Q(s,a)).