developerGuide· 10 min read· 2,162

i-spent-four-months-building-long-term-memory-into-my-ai-agents-and-here-is-what-actually-worked-and-what-was-overengineered

I build automation workflows using AI agents for client work. The biggest complaint from clients was that the AI forgot context between sessions — every conversation started from zero. I spent four months implementing three different long-term memory approaches on real production agents. This is the honest breakdown: what each approach built, which one held up in production, what I over-engineered, and the simple setup that solved 80% of the problem for 20% of the effort.

🔧 Tools mentioned in this article

Mem0

Memory layer for AI agents — open source self-hosted or managed cloud, free tier and paid from $19/month (€17.50 / £15.00 / ₹1,580)

mem0.ai

Visit

Pinecone

Vector database for semantic memory retrieval — free tier, paid from $70/month (€64.40 / £55.30 / ₹5,830)

www.pinecone.io

Visit

Supabase

PostgreSQL database with pgvector extension — free tier, paid from $25/month (€23 / £19.75 / ₹2,080)

supabase.com

Visit

OpenAI API

Used for embeddings and agent completions — embedding costs approximately $0.02 per million tokens

platform.openai.com

Visit

Priya Nair

June 23, 2026

#long term memory ai agents personal built four months honest 2026#ai agent long term memory how to build personal honest 2026#building memory ai agents personal experience honest results 2026#ai agent memory between sessions personal honest guide 2026#long term memory ai agents what works honest personal 2026

Context: I build n8n and custom Python AI agents for small business clients. The use cases: a customer service agent for an e-commerce brand, a sales follow-up agent for a B2B SaaS company, and a personal assistant agent for an executive. All three clients complained about the same thing after two weeks of use — the AI did not remember previous conversations. Four months of testing three memory approaches on these three live production agents. Stack: OpenAI API, Python, n8n, Supabase, Pinecone, and Mem0.

Why AI Agents Forget and What Long-Term Memory Actually Means

LLMs have no persistent memory by default. Each API call is stateless — the model knows only what you include in the current context window. Short-term memory (within a single conversation) is handled by including conversation history in the prompt. Long-term memory (between separate sessions, across days or weeks) requires an external storage system that retrieves relevant past information and injects it into the current context. The challenge is not building the storage — databases are straightforward. The challenge is retrieval: knowing which past information is relevant to the current conversation and injecting it without bloating the context with irrelevant history.

Approach 1: Simple Key-Value Memory Store

python

# Approach 1: Simple Key-Value Memory — the first thing I built
# Storage: Supabase (PostgreSQL)
# Retrieval: exact match or category lookup
# Verdict: works for structured facts, fails for conversational context

import json
from supabase import create_client

supabase = create_client("YOUR_SUPABASE_URL", "YOUR_SUPABASE_KEY")

def save_memory(user_id: str, category: str, key: str, value: str):
    """Save a structured fact about a user."""
    supabase.table("agent_memory").upsert({
        "user_id": user_id,
        "category": category,  # e.g. 'preference', 'fact', 'history'
        "key": key,            # e.g. 'preferred_contact_time'
        "value": value,        # e.g. 'Tuesday afternoon'
        "updated_at": "now()"
    }).execute()

def get_memory(user_id: str, category: str = None) -> str:
    """Retrieve memories for a user, optionally filtered by category."""
    query = supabase.table("agent_memory").select("*").eq("user_id", user_id)
    if category:
        query = query.eq("category", category)
    result = query.execute()

    if not result.data:
        return ""

    memory_text = "What I know about this user:\n"
    for row in result.data:
        memory_text += f"- {row['key']}: {row['value']}\n"
    return memory_text

def build_prompt_with_memory(user_id: str, user_message: str) -> str:
    """Build an agent prompt that includes retrieved memory."""
    memory = get_memory(user_id)
    system_prompt = f"""You are a helpful assistant.

{memory}

Use this context naturally in your responses. Do not mention that you have
a memory system. Just use the information as a knowledgeable assistant would."""
    return system_prompt

# What this handled well:
# - User preferences ('prefers email over phone')
# - Company facts ('their product is called ProTrack, not ProTracker')
# - Standing instructions ('always suggest the Pro plan first')
#
# What it failed on:
# - Conversational continuity ('last week you said you were unhappy with...')
# - Connecting disparate topics from different sessions
# - Knowing which past facts were still relevant vs outdated
#
# The e-commerce client was happy with this for structured preferences.
# The executive assistant client found it too rigid — it could not
# recall conversational context, only structured facts.

Approach 2: Vector Embedding Semantic Memory

python

# Approach 2: Vector Semantic Memory — more powerful, more complex
# Storage: Supabase with pgvector extension (or Pinecone)
# Retrieval: semantic similarity search
# Verdict: excellent recall, higher cost, overengineered for simple agents

from openai import OpenAI
from supabase import create_client
import json

client = OpenAI(api_key="YOUR_KEY")
supabase = create_client("YOUR_URL", "YOUR_KEY")

def embed_text(text: str) -> list[float]:
    """Convert text to vector embedding using OpenAI."""
    response = client.embeddings.create(
        model="text-embedding-3-small",  # $0.02 per million tokens
        input=text
    )
    return response.data[0].embedding

def save_memory_vector(user_id: str, content: str, metadata: dict = {}):
    """Save a memory with its vector embedding for semantic retrieval."""
    embedding = embed_text(content)
    supabase.table("vector_memory").insert({
        "user_id": user_id,
        "content": content,
        "embedding": embedding,
        "metadata": json.dumps(metadata),
        "created_at": "now()"
    }).execute()

def retrieve_relevant_memories(user_id: str, query: str, top_k: int = 5) -> str:
    """
    Find the most semantically similar past memories to the current query.
    Requires pgvector extension in Supabase.
    SQL: SELECT content FROM vector_memory
         WHERE user_id = $1
         ORDER BY embedding <=> $2 LIMIT $3
    """
    query_embedding = embed_text(query)

    # pgvector cosine similarity search via Supabase RPC
    result = supabase.rpc("match_memories", {
        "query_embedding": query_embedding,
        "match_user_id": user_id,
        "match_count": top_k
    }).execute()

    if not result.data:
        return ""

    memories = "Relevant context from previous conversations:\n"
    for item in result.data:
        memories += f"- {item['content']}\n"
    return memories

# What this handled well:
# - 'Remember when we discussed the Q3 campaign?' → found semantically
#   similar content even with different phrasing
# - Connecting related topics across sessions
# - High-volume agents where exact-match lookup would miss context
#
# What it failed on:
# - Cost: embedding every utterance added $2-8/month in API costs per agent
# - Latency: embedding + vector search added 200-400ms per request
# - Storage management: memories accumulated without expiry logic
#   and old contradictory information persisted alongside new facts
#
# Used this for the executive assistant agent. Client noticed the
# improvement in contextual recall immediately. Also noticed the
# slightly slower response time.

Approach 3: Mem0 Managed Memory Layer

What Mem0 is: An open-source and managed memory layer for AI agents that handles storage, retrieval, and memory management automatically. It stores facts extracted from conversations, deduplicates contradicting information, and retrieves relevant memories for new queries. Available as a Python library for self-hosted use or as a managed cloud service.
How I used it: Replaced both custom approaches for the sales follow-up agent. Integration took 3 hours compared to 2 days for the vector memory build. Mem0 extracts memories from conversation text automatically — I do not need to decide what to save. It saves structured facts, conversational context, and user preferences from each session.
What worked: The automatic extraction meant the agent started building relevant memory from day one without any manual curation. The managed deduplication handled cases where the user updated their contact preferences — Mem0 updated the stored fact rather than accumulating contradicting versions.
What did not work: Mem0's automatic extraction occasionally saved irrelevant information as a persistent memory. One agent stored 'user mentioned it was raining' as a fact in the permanent memory store. Memory quality control requires periodic review. The managed tier at $19/month is reasonable but adds a third-party dependency to the agent stack.
Verdict: For agents where memory quality matters more than build time, Mem0 managed is the fastest path to good long-term memory. For agents where cost control and full ownership matter, the vector approach with Supabase pgvector is more controllable.

The Simple Setup That Solved 80% of the Problem

python

# The simplest memory approach that satisfied most client needs
# No vector database, no embedding costs, minimal complexity
# Works for agents where structured context matters most

import json
from datetime import datetime

# Single JSON file per user (or Supabase row for production)
def load_user_context(user_id: str) -> dict:
    """Load the user's persistent context object."""
    try:
        with open(f"context_{user_id}.json", "r") as f:
            return json.load(f)
    except FileNotFoundError:
        return {
            "user_id": user_id,
            "facts": [],           # key facts about the user
            "preferences": {},     # structured preferences
            "last_topics": [],     # last 5 topics discussed
            "open_items": [],      # pending items from last session
            "last_session": None
        }

def update_user_context(user_id: str, conversation_summary: str,
                        new_facts: list, ai_client) -> None:
    """
    After each session, use the AI to extract and update context.
    This runs once at session end — not during conversation.
    """
    context = load_user_context(user_id)

    update_prompt = f"""
    Current user context:
    {json.dumps(context, indent=2)}

    This session summary:
    {conversation_summary}

    Update the context object. Rules:
    - Add new facts to 'facts' list (max 20, remove oldest if over)
    - Update preferences if user stated any clearly
    - Set 'last_topics' to the main 3 topics from this session
    - Set 'open_items' to anything the user asked to follow up on
    - Remove resolved items from open_items

    Return ONLY valid JSON matching the context structure. No other text.
    """

    response = ai_client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": update_prompt}],
        max_tokens=800
    )

    try:
        updated_context = json.loads(response.choices[0].message.content)
        updated_context["last_session"] = datetime.now().isoformat()
        with open(f"context_{user_id}.json", "w") as f:
            json.dump(updated_context, f, indent=2)
    except json.JSONDecodeError:
        pass  # Keep existing context if update fails

def build_prompt_with_context(user_id: str, user_message: str) -> str:
    """Build agent system prompt with user context injected."""
    context = load_user_context(user_id)

    context_summary = ""
    if context["facts"]:
        context_summary += "What I know: " + "; ".join(context["facts"][-10:]) + "\n"
    if context["open_items"]:
        context_summary += "Pending from last session: " + "; ".join(context["open_items"]) + "\n"
    if context["last_topics"]:
        context_summary += "Last talked about: " + ", ".join(context["last_topics"])

    return f"""You are a helpful assistant.\n\n{context_summary}"""

# Cost: ~$0.001 per session end update using GPT-4o-mini
# Latency: zero added to real-time conversation (runs post-session)
# Coverage: 80% of what clients needed from memory
# Missing: semantic recall of specific past statements

Mistakes I Made Building Agent Memory

Mistake 1: Building vector memory first because it seemed most sophisticated — spent two weeks on the vector implementation before testing the simple key-value approach. The simple approach solved the e-commerce client's entire memory requirement. The two weeks on vector memory would have been better spent on client work.
Mistake 2: No memory expiry logic — both custom approaches accumulated memories indefinitely. Old preferences contradicted updated preferences. Implemented a 90-day expiry on non-essential facts and a review flag for any fact older than 30 days.
Mistake 3: Injecting too much memory into the context window — early version injected all retrieved memories into every request. For users with 3 months of history, the context bloated significantly and pushed costs up. Now inject only the top 5 most relevant items and the last session summary.
Mistake 4: Not handling the case where memory contradicts the user's current statement — the agent would state a saved preference ('you said you prefer email') when the user had changed that preference. Added logic to treat anything the user says in the current session as overriding saved memory for that session.
Mistake 5: No client-facing memory transparency — clients were unaware the agent was using stored memory. One client was surprised when the agent referenced a conversation from three weeks earlier. Added a brief note at session start: 'I have context from your previous sessions.' Transparency about memory use is important for user trust.

What Memory Approach to Use and When

Simple JSON context (no extra cost beyond storage): use for agents where structured preferences and recent session continuity are the primary needs. Customer service agents, FAQ agents, simple personal assistants. Covers 80% of memory requirements for 20% of the build effort.
Vector semantic memory (Supabase pgvector ~$25/month + embedding costs): use for agents where users ask questions that implicitly reference past conversations — executive assistants, complex support agents, research assistants. Higher build complexity, better recall quality.
Mem0 managed ($19/month): use when build time matters more than cost control and full ownership. Fastest path to production-quality memory with automatic extraction. Adds third-party dependency.
No memory: still the right choice for stateless transactional agents — form-filling, one-shot queries, data transformation tasks where session continuity is irrelevant.

Final Thoughts

Four months of building agent memory confirmed what the first two months of client feedback already implied: the missing feature was not sophisticated semantic recall — it was basic session continuity and structured fact storage. The simple JSON context approach solved the customer service client's entire complaint in one afternoon of work. The vector approach was necessary for the executive assistant agent but overengineered for the simpler use cases. Start with the simplest memory approach that addresses your specific agent's actual recall needs. Add sophistication only when the simple approach demonstrably fails.

Home All posts