i added ai playtesting feedback to my unity prototype and it found things my human playtesters completely missed
Human playtesters are generous. They tell you what they liked and soften what they did not. They also cannot tell you precisely which moment their engagement dropped or why they avoided a particular game mechanic for the entire session without realizing it. I spent eight weeks building an AI feedback layer into my Unity prototype that logs every gameplay event and sends it to an AI for analysis. Here is exactly what I built, what it found, and whether it was worth the time.
OpenAI API
GPT-4o used to analyze gameplay event logs, approximately $4 per analysis session
platform.openai.com
Claude
Used for interpreting patterns across multiple playtest sessions, Pro plan $20 per month
claude.ai
Supabase
Database for storing gameplay event logs, free tier sufficient for prototype volumes
supabase.com
Marcus Webb
June 24, 2026
What the AI found that human playtesters did not tell me: players were consistently avoiding the shield mechanic for the first three minutes of each session. Every human playtester who mentioned the shield said it was fine or good. The AI event log showed that no player activated the shield in the first 180 seconds across 12 sessions, despite being in combat situations where it would have been useful. I asked three playtesters about this after seeing the data. None of them had consciously noticed they were avoiding it. The feedback led to a tutorial prompt change that made early shield usage a 35 percent increase in the next set of sessions.
What I Actually Built
I built a lightweight event logging system in Unity that records specific gameplay events to a JSON file during each play session. Events include: which mechanic was used and when, which areas of the level the player visited and in what order, death locations, time spent in specific zones, and which items were picked up or ignored. At the end of a session the log is sent to GPT-4o with a structured prompt asking for specific design insights. The system took two weekends to build and the per-session analysis cost is about four dollars using GPT-4o. I use Claude to synthesise patterns across multiple sessions.
The Event Logger Code
// GameplayEventLogger.cs
// Attach to a persistent GameObject in your scene
// Logs gameplay events to JSON for AI analysis
using System.Collections.Generic;
using System.IO;
using UnityEngine;
[System.Serializable]
public class GameplayEvent
{
public string eventType; // e.g. "mechanic_used", "player_death", "area_entered"
public string detail; // e.g. "shield", "zone_b", "health_pickup_ignored"
public float sessionTime; // seconds since session start
public Vector2 position; // world position when event occurred
}
[System.Serializable]
public class SessionLog
{
public string sessionId;
public string gameVersion;
public float totalSessionTime;
public List<GameplayEvent> events = new List<GameplayEvent>();
}
public class GameplayEventLogger : MonoBehaviour
{
public static GameplayEventLogger Instance;
private SessionLog currentSession;
private float sessionStartTime;
// Set these in Inspector
public string gameVersion = "0.3.1";
public bool loggingEnabled = true; // Disable for release builds
void Awake()
{
if (Instance != null) { Destroy(gameObject); return; }
Instance = this;
DontDestroyOnLoad(gameObject);
InitSession();
}
void InitSession()
{
currentSession = new SessionLog
{
sessionId = System.Guid.NewGuid().ToString()[..8],
gameVersion = gameVersion
};
sessionStartTime = Time.time;
}
// Call this from anywhere in your game code
// Examples:
// GameplayEventLogger.Instance.Log("mechanic_used", "shield", transform.position);
// GameplayEventLogger.Instance.Log("player_death", "fall_damage", transform.position);
// GameplayEventLogger.Instance.Log("item_ignored", "health_potion", item.position);
public void Log(string eventType, string detail, Vector2 position = default)
{
if (!loggingEnabled) return;
currentSession.events.Add(new GameplayEvent
{
eventType = eventType,
detail = detail,
sessionTime = Time.time - sessionStartTime,
position = position
});
}
// Call this when the player ends a session or closes the game
public void SaveSession()
{
currentSession.totalSessionTime = Time.time - sessionStartTime;
string json = JsonUtility.ToJson(currentSession, true);
string path = Path.Combine(Application.persistentDataPath,
$"session_{currentSession.sessionId}.json");
File.WriteAllText(path, json);
Debug.Log($"Session saved to: {path}");
}
void OnApplicationQuit()
{
SaveSession();
}
}
// To log a mechanic use from the shield controller:
// void Update() {
// if (Input.GetKeyDown(KeyCode.Space)) {
// GameplayEventLogger.Instance.Log(
// "mechanic_used", "shield",
// new Vector2(transform.position.x, transform.position.y));
// }
// }The AI Analysis Prompt That Produced Useful Insights
# The GPT-4o prompt I use after each playtest session
# I paste the session JSON directly into this prompt
---
You are a game design analyst reviewing a playtest session log
for a 2D action RPG prototype.
Game mechanics available to the player:
- Basic attack (space bar)
- Shield block (left shift)
- Dash (Q)
- Ranged attack (E, limited uses)
- Health pickup (walk over)
Level areas: Entry Zone, Forest Path, Combat Arena, Boss Approach
Here is the session event log:
[PASTE SESSION JSON HERE]
Please analyse this session and tell me:
1. Which mechanics did the player NOT use that they had available?
Flag any mechanic that was used less than twice in a session
longer than 5 minutes.
2. Were there any points where the player appeared to be confused
or stuck? Look for: same area revisited three or more times,
long gaps between events, deaths in the same location repeatedly.
3. What did the player spend the most time on?
Is this time allocation intentional based on game design goals?
4. Are there any items or pickups the player consistently ignored?
What might explain this behavior?
5. One hypothesis about a design change that could improve
engagement based only on this session data.
Be specific and data driven. Reference exact timestamps and
event counts where relevant. Do not pad the response with
general game design advice.
---
# After 4 or more sessions I use Claude Pro to synthesise patterns:
---
Here are analysis summaries from 6 playtest sessions of my prototype.
Each summary was generated from gameplay event logs.
[PASTE 6 SESSION SUMMARIES HERE]
Look across all 6 sessions and identify:
1. Behaviors that appear in 4 or more sessions (consistent patterns)
2. Mechanics that are being underused across sessions
3. The one design change that would address the most common issue
across multiple sessions
Be specific. Cite which sessions show each pattern.
---What the Analysis Found That Surprised Me
- The shield avoidance pattern I described in the summary. Completely invisible to human feedback. Obvious in the event log when I knew to look for unused mechanics.
- Players were dying in the same spot 73 percent of the time across six sessions. Not a difficulty spike, a visual clarity problem. A platform edge was casting a shadow that made its exact position ambiguous. One playtester had mentioned it felt a bit uneven in that area. The event log made it precise.
- The ranged attack mechanic was used almost exclusively in the first 90 seconds and then abandoned. Players tried it early and then stopped. The session analysis identified this pattern and hypothesised that the limited ammo counter was not visible enough to encourage conservation and reuse. I moved the counter and ranged attack usage increased in the next batch.
- Players spent 40 percent of their session time in the Forest Path area, which was designed to be a transition zone. The level was communicating the wrong thing about where the interesting content was. Resized and simplified the Forest Path zone and session time distribution changed toward the Arena as intended.
Mistakes Building This System
- Logging too much: My first version logged every frame the player was moving, every collision, and every animation state change. The resulting JSON was 8 megabytes per session and cost $18 in GPT-4o analysis because of the token count. Reduced to meaningful decision points only. Current logs are under 200 events per session and cost about $4 to analyze.
- Not labeling events clearly: Early event logs had entries like button_pressed without specifying which button. The AI analysis was vague in response. Adding specific detail strings like shield, dash, and ranged_attack made the analysis precise.
- Analyzing every session individually without cross session synthesis: The first four weeks I read individual session analyses without looking for patterns across sessions. The most valuable insights only appeared when I combined multiple sessions in Claude. Added the cross session synthesis step in week five.
- Not validating hypotheses before acting on them: The AI generated hypotheses about why players were behaving a certain way. Some were right. Some were wrong but plausible sounding. I now make one design change at a time based on a hypothesis and run two more sessions before acting on the next one.
Final Thoughts
Eight weeks of AI assisted playtest analysis found four significant design problems that six human playtest sessions had not surfaced clearly. The cost was about four dollars per session analysis and one weekend to build the logging system. Human playtesters are still essential for emotional response, subjective feel, and qualitative feedback. The AI analysis layer is for behavioral data that humans experience but cannot accurately describe. Both together are better than either alone.