● lab 03 | ~4 min

The three-tier model, live.

Letta is an agent runtime where memory has structure. Three tiers, each a different storage class. Core is in context every turn, Recall is searchable conversation history, Archival is the long-term vector store the agent reaches for with a tool call. This lab spins up an agent, talks to it once, then inspects the tiers.

tier 1

Core

Always in context. Two blocks by default: "human" (what the agent knows about you) and "persona" (who the agent is). Cheapest read, most expensive write.

tier 2

Recall

Every message in every conversation, searchable. Not in context unless the agent decides to look. Think "the chat log."

tier 3

Archival

Vector store the agent writes to and queries with a tool call. The dump for everything that doesn't earn a slot in Core.

step 1

Install + sign up.

Letta runs as cloud (free tier, fastest path for tonight) or self-hosted. We use cloud. Grab a token at app.letta.com, install the client:

install

pip install letta-client

Then set the token. Cloud token, NOT an OpenAI key (Letta brokers the LLM call for you on the free tier).

bash | zsh

export LETTA_API_KEY="your-letta-token"

powershell

$env:LETTA_API_KEY = "your-letta-token"

cmd.exe

set LETTA_API_KEY=your-letta-token

step 2

Create the agent.

Seed the two Core blocks at creation time. Pick a cheap model for the demo (gpt-4o-mini is the default-friendly choice).

letta_demo.py

import os
import sys
sys.stdout.reconfigure(encoding="utf-8")

from letta_client import Letta

client = Letta(token=os.environ["LETTA_API_KEY"])

agent = client.agents.create(
    memory_blocks=[
        {"label": "human", "value": "The user is Rayyan, building agents in SF."},
        {"label": "persona", "value": "You are a helpful agent that remembers."},
    ],
    model="openai/gpt-4o-mini",
    embedding="openai/text-embedding-3-small",
)

print("agent id:", agent.id)

step 3

Talk to it. Then read the tiers.

Append the rest of the file. One message in, one response out, then dump each tier so you can see where things landed.

letta_demo.py (continued)

resp = client.agents.messages.create(
    agent_id=agent.id,
    messages=[
        {"role": "user", "content": "What do you know about me?"},
    ],
)

# Letta returns a list of messages (reasoning + assistant + tool calls).
# Grab the final assistant message.
final = [m for m in resp.messages if m.message_type == "assistant_message"]
if final:
    print("agent:", final[-1].content)

# --- inspect the tiers ---

# tier 1: Core blocks
blocks = client.agents.blocks.list(agent_id=agent.id)
print("\n--- CORE ---")
for b in blocks:
    print(f"  [{b.label}] {b.value[:80]}")

# tier 2: Recall (message history)
msgs = client.agents.messages.list(agent_id=agent.id, limit=10)
print(f"\n--- RECALL ({len(msgs)} messages) ---")
for m in msgs[:4]:
    role = getattr(m, "role", "?")
    content = getattr(m, "content", "") or ""
    print(f"  [{role}] {content[:80]}")

# tier 3: Archival (vector store)
passages = client.agents.passages.list(agent_id=agent.id, limit=10)
print(f"\n--- ARCHIVAL ({len(passages)} passages) ---")
for p in passages:
    print(f"  {p.text[:80]}")

Run:

terminal

python letta_demo.py

expected output

agent id: agent-7f3e...
agent: You're Rayyan, building agents in SF. What
       would you like to work on today?

--- CORE ---
  [human] The user is Rayyan, building agents in SF.
  [persona] You are a helpful agent that remembers.

--- RECALL (3 messages) ---
  [system] You are a helpful agent that remembers...
  [user] What do you know about me?
  [assistant] You're Rayyan, building agents in SF...

--- ARCHIVAL (0 passages) ---

what to notice

Core is populated (the agent answered from "human"). Recall has three messages (system + user + assistant). Archival is empty. Archival doesn't fill on its own; the agent has to decide to call archival_memory_insert. That decision is part of the Letta loop, exposed as a tool to the model.

Ask the agent something that forces it to remember a long fact ("here's a 200-word backstory: ..."). Run the inspect block again. You should see a passage land in Archival.

troubleshooting

401 from the cloud. Token didn't load. Verify with python -c "import os; print(os.environ.get('LETTA_API_KEY','MISSING')[:8])".

Empty Recall. The free tier paginates. Pass limit=50 and check again.

"model not available." The free tier has a model allow-list. Try "letta/letta-free" as the model arg if gpt-4o-mini gets refused.

next | what was true on tuesday 04 | zep →