Letta is an agent runtime where memory has structure. Three tiers, each a different storage class. Core is in context every turn, Recall is searchable conversation history, Archival is the long-term vector store the agent reaches for with a tool call. This lab spins up an agent, talks to it once, then inspects the tiers.
Letta runs as cloud (free tier, fastest path for tonight) or self-hosted. We use cloud. Grab a token at app.letta.com, install the client:
pip install letta-client
Then set the token. Cloud token, NOT an OpenAI key (Letta brokers the LLM call for you on the free tier).
export LETTA_API_KEY="your-letta-token"
$env:LETTA_API_KEY = "your-letta-token"
set LETTA_API_KEY=your-letta-token
Seed the two Core blocks at creation time. Pick a cheap model for the demo (gpt-4o-mini is the default-friendly choice).
import os
import sys
sys.stdout.reconfigure(encoding="utf-8")
from letta_client import Letta
client = Letta(token=os.environ["LETTA_API_KEY"])
agent = client.agents.create(
memory_blocks=[
{"label": "human", "value": "The user is Rayyan, building agents in SF."},
{"label": "persona", "value": "You are a helpful agent that remembers."},
],
model="openai/gpt-4o-mini",
embedding="openai/text-embedding-3-small",
)
print("agent id:", agent.id)
Append the rest of the file. One message in, one response out, then dump each tier so you can see where things landed.
resp = client.agents.messages.create(
agent_id=agent.id,
messages=[
{"role": "user", "content": "What do you know about me?"},
],
)
# Letta returns a list of messages (reasoning + assistant + tool calls).
# Grab the final assistant message.
final = [m for m in resp.messages if m.message_type == "assistant_message"]
if final:
print("agent:", final[-1].content)
# --- inspect the tiers ---
# tier 1: Core blocks
blocks = client.agents.blocks.list(agent_id=agent.id)
print("\n--- CORE ---")
for b in blocks:
print(f" [{b.label}] {b.value[:80]}")
# tier 2: Recall (message history)
msgs = client.agents.messages.list(agent_id=agent.id, limit=10)
print(f"\n--- RECALL ({len(msgs)} messages) ---")
for m in msgs[:4]:
role = getattr(m, "role", "?")
content = getattr(m, "content", "") or ""
print(f" [{role}] {content[:80]}")
# tier 3: Archival (vector store)
passages = client.agents.passages.list(agent_id=agent.id, limit=10)
print(f"\n--- ARCHIVAL ({len(passages)} passages) ---")
for p in passages:
print(f" {p.text[:80]}")
Run:
python letta_demo.py
agent id: agent-7f3e...
agent: You're Rayyan, building agents in SF. What
would you like to work on today?
--- CORE ---
[human] The user is Rayyan, building agents in SF.
[persona] You are a helpful agent that remembers.
--- RECALL (3 messages) ---
[system] You are a helpful agent that remembers...
[user] What do you know about me?
[assistant] You're Rayyan, building agents in SF...
--- ARCHIVAL (0 passages) ---
Core is populated (the agent answered from "human"). Recall has three messages (system + user + assistant). Archival is empty. Archival doesn't fill on its own; the agent has to decide to call archival_memory_insert. That decision is part of the Letta loop, exposed as a tool to the model.
Ask the agent something that forces it to remember a long fact ("here's a 200-word backstory: ..."). Run the inspect block again. You should see a passage land in Archival.
401 from the cloud. Token didn't load. Verify with python -c "import os; print(os.environ.get('LETTA_API_KEY','MISSING')[:8])".
Empty Recall. The free tier paginates. Pass limit=50 and check again.
"model not available." The free tier has a model allow-list. Try "letta/letta-free" as the model arg if gpt-4o-mini gets refused.