AI Agent Memory Explained With Code Examples: Short-Term, Long-Term, Episodic, and Semantic Memory Patterns
AI agents have four distinct memory types and each serves a different purpose in agent behavior. This guide explains every memory type with working Python code, shows when each is appropriate, and documents the implementation mistakes that cause agents to forget critical context or waste tokens on irrelevant history.
Marcus Webb
June 19, 2026
Introduction
AI agents without well-designed memory forget what the user said five messages ago, repeat questions they already asked, and cannot improve over time from past interactions. Memory is not a single concept โ it is four distinct systems that work together. Understanding which type to use for which situation is the difference between an agent that feels intelligent and one that feels amnesiac.
The Four Memory Types
- Short-term memory (working memory): the current conversation context window. Holds the active dialogue. Cleared at conversation end.
- Long-term memory: persistent facts about a user or domain that persist across sessions. User preferences, past decisions, account information.
- Episodic memory: records of specific past interactions. What the user asked last Tuesday, what the agent responded, and what the outcome was.
- Semantic memory: generalized knowledge derived from many interactions. What do users in this category typically need? What patterns lead to successful task completion?
Implementation: All Four Memory Types
# Short-term memory: conversation buffer with sliding window
# Keeps recent exchanges in context, drops oldest when limit approached
import tiktoken
from dataclasses import dataclass, field
from datetime import datetime
@dataclass
class Message:
role: str # 'user' | 'assistant' | 'system'
content: str
timestamp: datetime = field(default_factory=datetime.utcnow)
tokens: int = 0
class ShortTermMemory:
"""
Sliding window conversation buffer.
Automatically trims oldest messages when approaching token limit.
Always preserves the system message.
"""
def __init__(
self,
max_tokens: int = 12_000,
model: str = "gpt-4o",
summarize_when_trimming: bool = True
):
self.max_tokens = max_tokens
self.model = model
self.summarize_when_trimming = summarize_when_trimming
self.messages: list[Message] = []
self.system_msg: Message | None = None
self.enc = tiktoken.get_encoding("cl100k_base")
def set_system(self, content: str) -> None:
self.system_msg = Message(
role="system",
content=content,
tokens=len(self.enc.encode(content))
)
def add(self, role: str, content: str) -> None:
msg = Message(
role=role,
content=content,
tokens=len(self.enc.encode(content))
)
self.messages.append(msg)
self._trim_if_needed()
def _trim_if_needed(self) -> None:
system_tokens = self.system_msg.tokens if self.system_msg else 0
available = self.max_tokens - system_tokens
current_tokens = sum(m.tokens for m in self.messages)
while current_tokens > available and len(self.messages) > 2:
# Remove the oldest user-assistant pair
removed_tokens = self.messages[0].tokens
self.messages.pop(0)
if self.messages and self.messages[0].role == "assistant":
removed_tokens += self.messages[0].tokens
self.messages.pop(0)
current_tokens -= removed_tokens
def get_context(self) -> list[dict]:
messages = []
if self.system_msg:
messages.append({"role": self.system_msg.role, "content": self.system_msg.content})
messages.extend([{"role": m.role, "content": m.content} for m in self.messages])
return messages
def clear(self) -> None:
self.messages = []# Long-term memory: persistent user facts stored in PostgreSQL
# Retrieves relevant facts before each conversation
import json
from typing import Optional
import psycopg2
from openai import OpenAI
client = OpenAI()
class LongTermMemory:
"""
Stores and retrieves persistent facts about a user.
Facts are structured key-value pairs with confidence scores.
"""
def __init__(self, db_url: str):
self.conn = psycopg2.connect(db_url)
self._ensure_table()
def _ensure_table(self):
with self.conn.cursor() as cur:
cur.execute("""
CREATE TABLE IF NOT EXISTS user_memory (
id SERIAL PRIMARY KEY,
user_id VARCHAR(255) NOT NULL,
fact_key VARCHAR(255) NOT NULL,
fact_value TEXT NOT NULL,
confidence FLOAT DEFAULT 1.0,
source VARCHAR(100), -- 'explicit', 'inferred', 'feedback'
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW(),
UNIQUE(user_id, fact_key)
)
""")
self.conn.commit()
def store(self, user_id: str, fact_key: str, fact_value: str,
confidence: float = 1.0, source: str = "explicit") -> None:
"""Store or update a fact about a user."""
with self.conn.cursor() as cur:
cur.execute("""
INSERT INTO user_memory (user_id, fact_key, fact_value, confidence, source)
VALUES (%s, %s, %s, %s, %s)
ON CONFLICT (user_id, fact_key)
DO UPDATE SET fact_value = EXCLUDED.fact_value,
confidence = EXCLUDED.confidence,
updated_at = NOW()
""", (user_id, fact_key, fact_value, confidence, source))
self.conn.commit()
def retrieve(self, user_id: str, keys: list[str] = None) -> dict:
"""Retrieve all or specific facts about a user."""
with self.conn.cursor() as cur:
if keys:
cur.execute(
"SELECT fact_key, fact_value, confidence FROM user_memory "
"WHERE user_id = %s AND fact_key = ANY(%s) AND confidence >= 0.5",
(user_id, keys)
)
else:
cur.execute(
"SELECT fact_key, fact_value, confidence FROM user_memory "
"WHERE user_id = %s AND confidence >= 0.5 ORDER BY updated_at DESC LIMIT 20",
(user_id,)
)
rows = cur.fetchall()
return {row[0]: {"value": row[1], "confidence": row[2]} for row in rows}
def extract_and_store_from_conversation(
self, user_id: str, conversation: str
) -> list[dict]:
"""Use LLM to extract facts from conversation and store them."""
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{
"role": "user",
"content": f"""
Extract factual information about the user from this conversation.
Return JSON array of facts.
Only include explicitly stated facts, not assumptions.
Format: [{{"key": "preference_language", "value": "Python", "confidence": 1.0}}]
Conversation:
{conversation}
"""
}],
response_format={"type": "json_object"},
temperature=0
)
facts = json.loads(response.choices[0].message.content).get("facts", [])
for fact in facts:
self.store(
user_id=user_id,
fact_key=fact["key"],
fact_value=fact["value"],
confidence=fact.get("confidence", 0.8),
source="inferred"
)
return factsCommon Mistakes
- Mistake 1: Including all conversation history in every context window โ for long-running agents this fills the context with irrelevant old messages. Use sliding window with summarization.
- Mistake 2: Storing everything in long-term memory โ facts with low confidence or that change frequently pollute the memory store. Set confidence thresholds and expiry dates.
- Mistake 3: Not distinguishing between memory types โ putting episodic memories (specific past events) in the same retrieval pool as semantic memories (general preferences) produces confusing context.
- Mistake 4: No memory update mechanism โ long-term facts that cannot be updated become stale. Implement confidence decay for old facts and allow explicit user correction.
- Mistake 5: Retrieving too much memory per context โ injecting 2000 tokens of memory facts buries the actual user query. Retrieve only the most relevant 300 to 500 tokens of memory per turn.
FAQ
- Q: Which memory type should I implement first? A: Short-term memory (conversation buffer) is required for any multi-turn agent. Long-term memory adds the most user-visible value after that.
- Q: How do I handle user requests to delete their memory? A: Implement a clear user_memory function that deletes by user_id. For GDPR compliance, this deletion must be complete and verifiable.
- Q: How much memory context should I inject per conversation turn? A: 300 to 500 tokens of retrieved memory facts per turn is sufficient for most use cases. More than that competes with the actual conversation for model attention.
Conclusion
AI agent memory is four distinct engineering challenges requiring four distinct solutions. Short-term memory is a sliding context window. Long-term memory is a structured facts database with confidence scoring. Episodic memory is a searchable record of past interactions. Semantic memory is generalized knowledge derived from aggregated patterns. Most production agents only need short-term and long-term memory to feel meaningfully intelligent to users.