ToolAIPilotTAP
Sub

Ad

AI Agent Memory Explained With Code Examples: Short-Term, Long-Term, Episodic, and Semantic Memory Patterns
developerGuideยท 5 min readยท 2,548

AI Agent Memory Explained With Code Examples: Short-Term, Long-Term, Episodic, and Semantic Memory Patterns

AI agents have four distinct memory types and each serves a different purpose in agent behavior. This guide explains every memory type with working Python code, shows when each is appropriate, and documents the implementation mistakes that cause agents to forget critical context or waste tokens on irrelevant history.

๐Ÿ”ง Tools mentioned in this article
Redis

Redis

Used for short-term and episodic memory storage with TTL-based expiration

redis.io

Visit
Pinecone

Pinecone

Vector database used for semantic long-term memory retrieval

www.pinecone.io

Visit
PostgreSQL

PostgreSQL

Relational database used for structured long-term memory and user preference storage

www.postgresql.org

Visit
Marcus Webb

Marcus Webb

June 19, 2026

#ai agent memory explained code examples 2026#llm agent memory types implementation python guide#ai agent memory short term long term semantic episodic#how to implement ai agent memory guide python 2026#ai agent memory patterns code examples complete 2026

Introduction

AI agents without well-designed memory forget what the user said five messages ago, repeat questions they already asked, and cannot improve over time from past interactions. Memory is not a single concept โ€” it is four distinct systems that work together. Understanding which type to use for which situation is the difference between an agent that feels intelligent and one that feels amnesiac.

The Four Memory Types

  • Short-term memory (working memory): the current conversation context window. Holds the active dialogue. Cleared at conversation end.
  • Long-term memory: persistent facts about a user or domain that persist across sessions. User preferences, past decisions, account information.
  • Episodic memory: records of specific past interactions. What the user asked last Tuesday, what the agent responded, and what the outcome was.
  • Semantic memory: generalized knowledge derived from many interactions. What do users in this category typically need? What patterns lead to successful task completion?

Implementation: All Four Memory Types

python
# Short-term memory: conversation buffer with sliding window
# Keeps recent exchanges in context, drops oldest when limit approached

import tiktoken
from dataclasses import dataclass, field
from datetime import datetime

@dataclass
class Message:
    role:      str  # 'user' | 'assistant' | 'system'
    content:   str
    timestamp: datetime = field(default_factory=datetime.utcnow)
    tokens:    int = 0

class ShortTermMemory:
    """
    Sliding window conversation buffer.
    Automatically trims oldest messages when approaching token limit.
    Always preserves the system message.
    """
    def __init__(
        self,
        max_tokens:     int  = 12_000,
        model:          str  = "gpt-4o",
        summarize_when_trimming: bool = True
    ):
        self.max_tokens              = max_tokens
        self.model                   = model
        self.summarize_when_trimming = summarize_when_trimming
        self.messages:    list[Message]    = []
        self.system_msg:  Message | None   = None
        self.enc = tiktoken.get_encoding("cl100k_base")
    
    def set_system(self, content: str) -> None:
        self.system_msg = Message(
            role="system",
            content=content,
            tokens=len(self.enc.encode(content))
        )
    
    def add(self, role: str, content: str) -> None:
        msg = Message(
            role=role,
            content=content,
            tokens=len(self.enc.encode(content))
        )
        self.messages.append(msg)
        self._trim_if_needed()
    
    def _trim_if_needed(self) -> None:
        system_tokens  = self.system_msg.tokens if self.system_msg else 0
        available      = self.max_tokens - system_tokens
        current_tokens = sum(m.tokens for m in self.messages)
        
        while current_tokens > available and len(self.messages) > 2:
            # Remove the oldest user-assistant pair
            removed_tokens = self.messages[0].tokens
            self.messages.pop(0)
            if self.messages and self.messages[0].role == "assistant":
                removed_tokens += self.messages[0].tokens
                self.messages.pop(0)
            current_tokens -= removed_tokens
    
    def get_context(self) -> list[dict]:
        messages = []
        if self.system_msg:
            messages.append({"role": self.system_msg.role, "content": self.system_msg.content})
        messages.extend([{"role": m.role, "content": m.content} for m in self.messages])
        return messages
    
    def clear(self) -> None:
        self.messages = []
python
# Long-term memory: persistent user facts stored in PostgreSQL
# Retrieves relevant facts before each conversation

import json
from typing import Optional
import psycopg2
from openai import OpenAI

client = OpenAI()

class LongTermMemory:
    """
    Stores and retrieves persistent facts about a user.
    Facts are structured key-value pairs with confidence scores.
    """
    def __init__(self, db_url: str):
        self.conn = psycopg2.connect(db_url)
        self._ensure_table()
    
    def _ensure_table(self):
        with self.conn.cursor() as cur:
            cur.execute("""
                CREATE TABLE IF NOT EXISTS user_memory (
                    id          SERIAL PRIMARY KEY,
                    user_id     VARCHAR(255) NOT NULL,
                    fact_key    VARCHAR(255) NOT NULL,
                    fact_value  TEXT NOT NULL,
                    confidence  FLOAT DEFAULT 1.0,
                    source      VARCHAR(100),  -- 'explicit', 'inferred', 'feedback'
                    created_at  TIMESTAMP DEFAULT NOW(),
                    updated_at  TIMESTAMP DEFAULT NOW(),
                    UNIQUE(user_id, fact_key)
                )
            """)
            self.conn.commit()
    
    def store(self, user_id: str, fact_key: str, fact_value: str,
              confidence: float = 1.0, source: str = "explicit") -> None:
        """Store or update a fact about a user."""
        with self.conn.cursor() as cur:
            cur.execute("""
                INSERT INTO user_memory (user_id, fact_key, fact_value, confidence, source)
                VALUES (%s, %s, %s, %s, %s)
                ON CONFLICT (user_id, fact_key)
                DO UPDATE SET fact_value = EXCLUDED.fact_value,
                              confidence = EXCLUDED.confidence,
                              updated_at = NOW()
            """, (user_id, fact_key, fact_value, confidence, source))
            self.conn.commit()
    
    def retrieve(self, user_id: str, keys: list[str] = None) -> dict:
        """Retrieve all or specific facts about a user."""
        with self.conn.cursor() as cur:
            if keys:
                cur.execute(
                    "SELECT fact_key, fact_value, confidence FROM user_memory "
                    "WHERE user_id = %s AND fact_key = ANY(%s) AND confidence >= 0.5",
                    (user_id, keys)
                )
            else:
                cur.execute(
                    "SELECT fact_key, fact_value, confidence FROM user_memory "
                    "WHERE user_id = %s AND confidence >= 0.5 ORDER BY updated_at DESC LIMIT 20",
                    (user_id,)
                )
            rows = cur.fetchall()
        return {row[0]: {"value": row[1], "confidence": row[2]} for row in rows}
    
    def extract_and_store_from_conversation(
        self, user_id: str, conversation: str
    ) -> list[dict]:
        """Use LLM to extract facts from conversation and store them."""
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{
                "role": "user",
                "content": f"""
                Extract factual information about the user from this conversation.
                Return JSON array of facts.
                Only include explicitly stated facts, not assumptions.
                
                Format: [{{"key": "preference_language", "value": "Python", "confidence": 1.0}}]
                
                Conversation:
                {conversation}
                """
            }],
            response_format={"type": "json_object"},
            temperature=0
        )
        facts = json.loads(response.choices[0].message.content).get("facts", [])
        
        for fact in facts:
            self.store(
                user_id=user_id,
                fact_key=fact["key"],
                fact_value=fact["value"],
                confidence=fact.get("confidence", 0.8),
                source="inferred"
            )
        return facts

Common Mistakes

  • Mistake 1: Including all conversation history in every context window โ€” for long-running agents this fills the context with irrelevant old messages. Use sliding window with summarization.
  • Mistake 2: Storing everything in long-term memory โ€” facts with low confidence or that change frequently pollute the memory store. Set confidence thresholds and expiry dates.
  • Mistake 3: Not distinguishing between memory types โ€” putting episodic memories (specific past events) in the same retrieval pool as semantic memories (general preferences) produces confusing context.
  • Mistake 4: No memory update mechanism โ€” long-term facts that cannot be updated become stale. Implement confidence decay for old facts and allow explicit user correction.
  • Mistake 5: Retrieving too much memory per context โ€” injecting 2000 tokens of memory facts buries the actual user query. Retrieve only the most relevant 300 to 500 tokens of memory per turn.

FAQ

  • Q: Which memory type should I implement first? A: Short-term memory (conversation buffer) is required for any multi-turn agent. Long-term memory adds the most user-visible value after that.
  • Q: How do I handle user requests to delete their memory? A: Implement a clear user_memory function that deletes by user_id. For GDPR compliance, this deletion must be complete and verifiable.
  • Q: How much memory context should I inject per conversation turn? A: 300 to 500 tokens of retrieved memory facts per turn is sufficient for most use cases. More than that competes with the actual conversation for model attention.

Conclusion

AI agent memory is four distinct engineering challenges requiring four distinct solutions. Short-term memory is a sliding context window. Long-term memory is a structured facts database with confidence scoring. Episodic memory is a searchable record of past interactions. Semantic memory is generalized knowledge derived from aggregated patterns. Most production agents only need short-term and long-term memory to feel meaningfully intelligent to users.

Ad

AI Agent Memory Explained With Code Examples: Short-Term, Long-Term, Episodic, and Semantic Memory Patterns | ToolAIPilot