71 KiB
🧠 AI Agent Frameworks — 8-Week Deep Study Plan
Goal: Go from "I've heard of these" to "I could build & deploy production systems with these" in 8 weeks. Time commitment: ~1-2 hours/day, Mon-Fri Based on: Trending Repos Deep Dive Analysis (Feb 2026) Last updated: February 4, 2026
📋 Table of Contents
- Week 0: Prep & Prerequisites
- Week 1: Pydantic-AI — The Production SDK ⭐⭐
- Week 2: Microsoft Agent Framework — Enterprise Orchestration ⭐⭐⭐
- Week 3: Agent-S — Computer Use Pioneer ⭐⭐⭐⭐
- Week 4: GPT Researcher — Deep Research Agent ⭐⭐
- Week 5: Yao — Event-Driven Agents in Go ⭐⭐⭐⭐
- Week 6: MetaGPT — Multi-Agent SOP Framework ⭐⭐⭐
- Week 7: ElizaOS — Deployment & Multi-Platform Distribution ⭐⭐
- Week 8: Capstone Project
- Appendix: Comparison Matrix Template
⭐ = Difficulty Rating (1-5). More stars = harder week.
Week 0: Prep & Prerequisites
Timeline: The weekend before you start. ~3-4 hours total.
Environment Setup
- Python 3.11+ installed (
python --version) - Go 1.21+ installed for Week 5 (
go version) - Node.js 18+ and
pnpminstalled (needed for MetaGPT and Yao) - Docker Desktop installed and running
- Git configured with SSH keys for cloning repos
- VS Code (or your editor) with Python + Go extensions
- A GPU or cloud GPU access (optional, helps for Agent-S grounding model)
API Keys & Accounts
- OpenAI API key — used by almost every framework
- Anthropic API key — primary for Pydantic-AI examples
- Tavily API key — required for GPT Researcher (free tier works: app.tavily.com)
- Azure OpenAI access — needed for Microsoft Agent Framework (free trial available)
- Hugging Face account + token — needed for Agent-S grounding model
- Google API key — optional, for Gemini-based features in GPT Researcher
Workspace Setup
# Create a clean workspace for all 6 weeks
mkdir -p ~/agent-study/{week1-pydantic-ai,week2-ms-agent,week3-agent-s,week4-gpt-researcher,week5-yao,week6-metagpt,capstone}
mkdir -p ~/agent-study/notes
mkdir -p ~/agent-study/comparison-matrix
# Initialize a git repo for your study notes
cd ~/agent-study
git init
echo "# AI Agent Frameworks Study" > README.md
git add . && git commit -m "init study workspace"
Background Reading (1-2 hours)
Read these before Week 1. They're the conceptual foundation:
- Plan-and-Solve Prompting — The paper behind GPT Researcher's architecture. Skim the abstract + Section 3.
- RAG paper — Core concept used by multiple frameworks. Read abstract + intro.
- Model Context Protocol (MCP) spec — Anthropic's protocol for tool integration. Read the overview page.
- Agent2Agent (A2A) protocol — Google's agent interop standard. Skim the spec overview.
- Pydantic docs (crash course) — If you're rusty on Pydantic, spend 30 min here. It's the foundation of Week 1.
Mental Model to Build
Every agent framework answers the same 5 questions differently:
- How do you define an agent? (class, function, config, DSL)
- How do agents use tools? (function calling, MCP, code execution)
- How do multiple agents coordinate? (graph, SOP, message passing, events)
- How do you handle errors & retries? (automatic, manual, durable execution)
- How do you observe what happened? (logging, tracing, replay)
Keep these questions in mind every week. By Week 6, you'll have 6 different answers for each.
Week 1: Pydantic-AI
Difficulty: ⭐⭐ (Approachable — excellent docs, familiar Python patterns) Repo: github.com/pydantic/pydantic-ai Stars: 14.6k | Language: Python | Version: v1.52.0+
Why This Is Week 1
Pydantic-AI is the most ergonomic agent framework and has the best docs. Starting here builds your mental model for how agent SDKs should feel. Everything after this week will be compared to Pydantic-AI's developer experience. It's the FastAPI of agents — you'll understand why once you use it.
Resources
| Resource | Link |
|---|---|
| 📖 Documentation | ai.pydantic.dev |
| 💬 Community (Slack) | Pydantic Slack |
| 📦 PyPI | pydantic-ai |
| 🔭 Observability | Pydantic Logfire |
| 📝 Blog: How it was built | Pydantic blog |
| 🎥 Intro video | Search "Pydantic AI tutorial 2025" on YouTube |
🗂 Source Code Guide — "Read THESE Files"
pydantic_ai_slim/pydantic_ai/
├── agent/
│ └── __init__.py # ⭐ THE file. Agent class definition, run(), run_sync(), run_stream()
├── _agent_graph.py # ⭐ Internal agent execution graph — how runs actually execute
├── tools.py # ⭐ Tool decorator, RunContext, tool schema generation
├── result.py # ⭐ RunResult, StreamedRunResult — output handling
├── models/
│ ├── __init__.py # Model ABC — how all model providers implement the same interface
│ ├── openai.py # OpenAI provider implementation
│ └── anthropic.py # Anthropic provider implementation
├── _a2a.py # Agent2Agent protocol integration
├── mcp.py # MCP client/server integration
└── _output.py # Output type handling, Pydantic validation on LLM outputs
💡 Tip: Start with
agent/__init__.py. It's beautifully documented with docstrings. Then readtools.pyto understand how the@agent.tooldecorator works. Finally, read_agent_graph.pyto see how the runtime orchestrates tool calls.
Day 1 (Monday): Architecture Deep Dive
Read:
- The full README
- Docs: Introduction
- Docs: Agents
- Docs: Models Overview
- Docs: Tools
- Docs: Output / Structured Results
- Docs: Dependency Injection (if exists) or see DI pattern in the bank support example
Identify core abstractions:
Agent— the central class (generic over deps + output type)RunContext— carries dependencies into tool functionsTool— decorated functions the LLM can callModelSettings— per-request model configurationRunResult/StreamedRunResult— typed output containers
Understand the execution flow:
User prompt → Agent.run() → Model call → [Tool call → Tool execution → Model call]* → Validated output
- 📝 Homework: Write a 1-page architecture summary at
~/agent-study/notes/week1-architecture.md- Cover: Agent lifecycle, dependency injection pattern, how tools are registered and called, how output validation works
- Draw a simple diagram (ASCII or hand-drawn photo is fine)
Day 2 (Tuesday): Hello World + Core Concepts
Setup:
cd ~/agent-study/week1-pydantic-ai
python -m venv .venv && source .venv/bin/activate
pip install pydantic-ai
Run the quickstart:
from pydantic_ai import Agent
agent = Agent(
'anthropic:claude-sonnet-4-0',
instructions='Be concise, reply with one sentence.',
)
result = agent.run_sync('Where does "hello world" come from?')
print(result.output)
Understand the core API surface:
-
agent.run()vsagent.run_sync()vsagent.run_stream() -
How
instructionswork (static string vs dynamic function) -
How model selection works (string shorthand vs model objects)
-
How
result.outputis typed -
📝 Homework: Build the simplest agent from scratch — NO copy-paste
- Requirements: takes a topic, returns a structured output (use a Pydantic model as the output type)
- Must use at least one custom instruction
- Save at
~/agent-study/week1-pydantic-ai/hello_agent.py
Day 3 (Wednesday): Intermediate Build — Structured Output + DI
Focus: Pydantic-AI's killer features — type-safe structured output and dependency injection
Work through:
- The bank support agent example from the docs
- Docs: Structured Output / Streamed Results
- Docs: Graph Support
Key concepts to grok:
-
How
RunContext[DepsType]carries typed dependencies -
How Pydantic models as output types create validated, structured responses
-
How tool docstrings become the tool description sent to the LLM
-
How streaming works with structured output (partial validation!)
-
📝 Homework: Build an agent that uses the framework's unique capabilities:
- Must include: Dependency injection with a real dependency (database mock, API client, etc.)
- Must include: Structured output via a Pydantic model (not just string output)
- Must include: At least 2 tools
- Example idea: A "recipe finder" agent with deps for a recipe database, tools for searching and filtering, output as a structured
Recipemodel - Save at
~/agent-study/week1-pydantic-ai/structured_agent.py
Day 4 (Thursday): Advanced Patterns + Source Code Reading
Read these source files (in order):
pydantic_ai_slim/pydantic_ai/agent/__init__.py— HowAgentclass is defined, the generic type parameterspydantic_ai_slim/pydantic_ai/tools.py— How@toolworks, schema generation,RunContextpydantic_ai_slim/pydantic_ai/_agent_graph.py— The internal execution enginepydantic_ai_slim/pydantic_ai/result.py— How results are wrapped, streamed, validatedpydantic_ai_slim/pydantic_ai/models/__init__.py— The model provider ABC
Understand:
- How errors from tool execution are passed back to the LLM for retry
- How streaming works internally (incremental Pydantic validation)
- How the
_agent_graph.pyorchestrates the conversation loop - How durable execution checkpoints work
Explore advanced features:
-
Docs: Durable Execution
-
Docs: MCP Integration
-
Docs: Human-in-the-Loop
-
Docs: Evals
-
📝 Homework: Write "What I'd Steal from Pydantic-AI" at
~/agent-study/notes/week1-steal.md- Focus on: DI pattern, type-safe generics, streaming validation, tool retry pattern
- What design decisions are genius? What would you do differently?
Day 5 (Friday): Integration Project + Reflection
-
Build a mini-project that integrates with something real:
- Suggested: An agent that queries a real API (weather, GitHub, Hacker News), processes the data through tools, and returns a structured report as a Pydantic model
- Bonus: Add Logfire observability (it's free tier) and see the traces
- Bonus: Expose it as an MCP server
- Save at
~/agent-study/week1-pydantic-ai/integration_project/
-
Write retrospective at
~/agent-study/notes/week1-retro.md:- Strengths of Pydantic-AI
- Weaknesses / gaps you noticed
- When would you reach for this vs building from scratch?
- What surprised you?
-
Start comparison matrix at
~/agent-study/comparison-matrix/matrix.md(see template)
🎯 Key Questions — You Should Be Able to Answer:
- What does the
Agentclass generic signatureAgent[DepsType, OutputType]buy you? - How does dependency injection work in Pydantic-AI and why is it better than global state?
- How does Pydantic-AI validate structured output from an LLM that returns free-form text?
- What happens when a tool call fails? How does the retry loop work?
- What's the difference between
run(),run_sync(), andrun_stream()? - How would you add a new model provider to Pydantic-AI?
- What is durable execution and when would you use it?
Week 2: Microsoft Agent Framework
Difficulty: ⭐⭐⭐ (Larger surface area, graph concepts, mono-repo navigation) Repo: github.com/microsoft/agent-framework Stars: 7k | Languages: Python + .NET | Born from: Semantic Kernel + AutoGen
Why This Is Week 2
If Pydantic-AI is the developer's choice, Microsoft Agent Framework is the enterprise's choice. It introduces graph-based workflows — a fundamentally different orchestration model from the simple agent loop you learned in Week 1. Understanding this framework means understanding where corporate AI agent development is heading.
Resources
| Resource | Link |
|---|---|
| 📖 Documentation | learn.microsoft.com/agent-framework |
| 🚀 Quick Start | Quick Start Tutorial |
| 💬 Discord | Discord |
| 🎥 Intro Video (30 min) | YouTube |
| 🎥 DevUI Demo (1 min) | YouTube |
| 📦 PyPI | agent-framework |
| 📝 Migration from SK | Semantic Kernel Migration |
| 📝 Migration from AutoGen | AutoGen Migration |
🗂 Source Code Guide
python/packages/
├── agent-framework/ # ⭐ Core package — agents, middleware, workflows
│ └── src/agent_framework/
│ ├── agents/ # Agent base classes and implementations
│ ├── workflows/ # ⭐ Graph-based workflow engine
│ └── middleware/ # ⭐ Request/response middleware pipeline
├── azure-ai/ # Azure AI provider (Responses API)
├── openai/ # OpenAI provider
├── anthropic/ # Anthropic provider
├── devui/ # ⭐ Developer UI for debugging workflows
├── mcp/ # MCP integration
├── a2a/ # Agent2Agent protocol
└── lab/ # Experimental features (benchmarking, RL)
python/samples/getting_started/
├── agents/ # ⭐ Start here — basic agent examples
├── workflows/ # ⭐ Graph workflow examples (critical!)
├── middleware/ # Middleware examples
└── observability/ # OpenTelemetry integration
💡 Tip: This is a mono-repo. Don't try to read everything. Focus on
python/packages/agent-framework/for the core, andpython/samples/getting_started/workflows/for the graph workflow examples.
Day 1 (Monday): Architecture Deep Dive
Read:
- Overview
- The full README
- User Guide Overview
- Watch the 30-min intro video (at 1.5x speed)
- Skim the SK migration guide to understand lineage
Identify core abstractions:
Agent— base agent interfaceWorkflow/Graph— the graph-based orchestration systemMiddleware— request/response processing pipelineAgentProvider— LLM provider abstractionDevUI— visual debugging tool
Key architectural insight: This framework uses a data-flow graph model where nodes are agents or functions, and edges carry data between them. This is fundamentally different from Pydantic-AI's linear agent loop.
- 📝 Homework: Write a 1-page architecture summary at
~/agent-study/notes/week2-architecture.md- Compare the graph workflow model to Pydantic-AI's linear model
- Draw the graph workflow concept (nodes = agents/functions, edges = data flow)
Day 2 (Tuesday): Hello World + Core Concepts
Setup:
cd ~/agent-study/week2-ms-agent
python -m venv .venv && source .venv/bin/activate
pip install agent-framework --pre
# You'll need Azure credentials or an OpenAI key
Run the quickstart:
import asyncio
from agent_framework.openai import OpenAIChatClient
async def main():
agent = OpenAIChatClient(
api_key="your-key"
).as_agent(
name="HaikuBot",
instructions="You are an upbeat assistant that writes beautifully.",
)
print(await agent.run("Write a haiku about AI agents."))
asyncio.run(main())
Understand:
-
as_agent()pattern — how providers become agents -
The difference between Chat agents and Responses agents
-
How the Python API differs from the .NET API (skim a .NET example)
-
📝 Homework: Build the simplest agent from scratch — NO copy-paste
- Save at
~/agent-study/week2-ms-agent/hello_agent.py
- Save at
Day 3 (Wednesday): Intermediate Build — Graph Workflows
This is the key differentiator. This is the day that matters.
Work through:
python/samples/getting_started/workflows/— all examples- Docs: Workflow/Graph tutorials on learn.microsoft.com
- Understand streaming, checkpointing, and time-travel in graphs
Key concepts:
-
How nodes in a graph can be agents OR deterministic functions
-
How data flows between nodes via typed edges
-
How checkpointing enables pause/resume of long-running workflows
-
How human-in-the-loop fits into the graph model
-
How time-travel lets you replay/debug workflows
-
📝 Homework: Build a graph workflow:
- Must include: At least 3 nodes (mix of agent nodes and function nodes)
- Must include: Branching logic (conditional edges)
- Example idea: A "content pipeline" — Node 1 (agent: research a topic) → Node 2 (function: format research) → Node 3 (agent: write blog post) with a branch for "needs more research"
- Save at
~/agent-study/week2-ms-agent/graph_workflow.py
Day 4 (Thursday): Advanced Patterns + Source Code Reading
Read these source files:
- Core agent base classes in
python/packages/agent-framework/ - Workflow/graph engine implementation
- Middleware pipeline implementation
- DevUI package structure
- At least one provider implementation (OpenAI or Azure)
Explore:
-
Set up and run the DevUI — visualize your graph workflow from Day 3
-
Look at the OpenTelemetry integration —
python/samples/getting_started/observability/ -
Read the middleware examples — understand the request/response pipeline
-
Check out the lab package — what's experimental?
-
📝 Homework: Write "What I'd Steal from MS Agent Framework" at
~/agent-study/notes/week2-steal.md- Focus on: Graph workflow model, DevUI concept, middleware pipeline, multi-language support
- Compare to Pydantic-AI: when would you choose one over the other?
Day 5 (Friday): Integration Project + Reflection
-
Build a mini-project:
- Suggested: A multi-step data processing pipeline using graph workflows
- Must have: at least one agent node calling an LLM, at least one pure function node, checkpointing enabled
- Bonus: Get the DevUI running and screenshot your workflow visualization
- Save at
~/agent-study/week2-ms-agent/integration_project/
-
Write retrospective at
~/agent-study/notes/week2-retro.md -
Update comparison matrix — add MS Agent Framework entry
🎯 Key Questions:
- What's the difference between a linear agent loop and a graph-based workflow?
- How does checkpointing work in MS Agent Framework workflows?
- What does "time-travel" mean in the context of agent debugging?
- How does the middleware pipeline work and when would you use it?
- What's the DevUI and what can you debug with it that you can't with logs alone?
- How does this framework's agent abstraction compare to Pydantic-AI's
Agentclass? - When would you choose MS Agent Framework over Pydantic-AI? (Think: team size, workflow complexity, language requirements)
Week 3: Agent-S
Difficulty: ⭐⭐⭐⭐ (Requires GPU for grounding model, novel paradigm, research-grade code) Repo: github.com/simular-ai/Agent-S Stars: 9.6k | Language: Python | Papers: ICLR 2025, COLM 2025
Why This Is Week 3
This is a completely different paradigm. Weeks 1-2 were about agents that work with APIs and text. Agent-S works with pixels and clicks — it uses your computer like a human does. This is the frontier of agent development. Understanding Agent-S means understanding where computer-use agents are heading.
Resources
| Resource | Link |
|---|---|
| 📖 Repo | github.com/simular-ai/Agent-S |
| 💬 Discord | Discord |
| 📄 S1 Paper (ICLR 2025) | arxiv.org/abs/2410.08164 |
| 📄 S2 Paper (COLM 2025) | arxiv.org/abs/2504.00906 |
| 📄 S3 Paper | arxiv.org/abs/2510.02250 |
| 🌐 S3 Blog | simular.ai/articles/agent-s3 |
| 🎥 S3 Video | YouTube |
| 📦 PyPI | gui-agents |
| 🤗 Grounding Model | UI-TARS-1.5-7B |
🗂 Source Code Guide
gui_agents/
├── s3/ # ⭐ Latest version — start here
│ ├── cli_app.py # ⭐ Entry point — CLI application, main loop
│ ├── agents/ # ⭐ Agent implementations (planning, grounding, execution)
│ ├── core/ # ⭐ Core abstractions (screenshot, actions, state)
│ ├── bbon/ # Behavior Best-of-N — sampling strategy for better performance
│ └── prompts/ # System prompts for each agent role
├── s2/ # Previous version
├── s2_5/ # Intermediate version
├── s1/ # Original version (ICLR 2025)
└── utils.py # Shared utilities
💡 Tip: Focus entirely on
gui_agents/s3/. Read the papers' system diagrams first, THEN the code. The code makes 10x more sense with the paper's architecture diagram in front of you.
⚠️ Setup Note: Agent-S requires a grounding model (UI-TARS-1.5-7B). You can host it on Hugging Face Inference Endpoints (~$1-2/hr for A10G), use a free tier if available, or run it locally if you have a capable GPU (16GB+ VRAM). Alternatively, study the code architecture without running the full system.
Day 1 (Monday): Architecture Deep Dive
Read:
- The full README
- S3 blog post — accessible overview
- S1 Paper (at least abstract + Sections 1-3) — core architecture concepts
- S3 Paper (abstract + architecture section) — latest improvements
models.mdin the repo — supported model configurations
Identify core abstractions:
- Screenshot Capture — the agent "sees" the screen as an image
- Grounding Model (UI-TARS) — converts screenshots to UI element locations
- Planning Agent — decides what to do based on current screen + goal
- Execution Agent — translates plans into mouse/keyboard actions
- Behavior Best-of-N (bBoN) — run multiple rollouts, pick the best
The pipeline:
Task → Screenshot → Grounding (UI-TARS: identify elements) → Planning (LLM: what to do) → Action (click/type/scroll) → New Screenshot → Loop
- 📝 Homework: Write architecture summary at
~/agent-study/notes/week3-architecture.md- Include the screenshot→grounding→planning→action pipeline
- Explain bBoN and why it matters (72.6% vs 66% on OSWorld)
- Compare: how is "seeing" a screen different from "calling" an API?
Day 2 (Tuesday): Hello World + Core Concepts
Setup:
cd ~/agent-study/week3-agent-s
python -m venv .venv && source .venv/bin/activate
pip install gui-agents
brew install tesseract # Required dependency
API configuration:
export OPENAI_API_KEY=<your-key>
export ANTHROPIC_API_KEY=<your-key>
export HF_TOKEN=<your-huggingface-token>
Run Agent-S3 (if you have grounding model access):
agent_s \
--provider openai \
--model gpt-4o \
--ground_provider huggingface \
--ground_url <your-endpoint-url> \
--ground_model ui-tars-1.5-7b \
--grounding_width 1920 \
--grounding_height 1080
If you can't run it: Read through
gui_agents/s3/cli_app.pyline by line and trace the execution flow. Understand what WOULD happen at each step.
- 📝 Homework: Even if you can't run the full agent, build a minimal screenshot → analysis script:
# Take a screenshot, send it to a vision model, get a description of UI elements # This exercises the same "visual grounding" concept, just simplified- Save at
~/agent-study/week3-agent-s/hello_agent.py
- Save at
Day 3 (Wednesday): Intermediate Build — Understanding Computer Use
Work through:
- Read
gui_agents/s3/agents/— understand the multi-agent architecture - Read
gui_agents/s3/core/— how screenshots are captured and actions are executed - Study the prompt templates in
gui_agents/s3/— how the LLM is instructed - Understand the bBoN strategy in
gui_agents/s3/bbon/
Key concepts:
-
How screenshots are processed and annotated for the LLM
-
How the grounding model converts visual elements to coordinates
-
How actions (click, type, scroll) are executed on the OS level
-
Cross-platform differences (Linux/Mac/Windows)
-
The local coding environment feature
-
📝 Homework: Build something that uses the computer-use paradigm:
- Option A (with GPU): Give Agent-S a simple task (open a browser, search for something, copy a result)
- Option B (without GPU): Build a simplified "screen reader" agent that takes a screenshot, uses a vision model to understand the UI, and outputs a structured description of what's on screen + suggested next actions
- Save at
~/agent-study/week3-agent-s/computer_use_demo/
Day 4 (Thursday): Advanced Patterns + Source Code Reading
Read these source files (in order):
gui_agents/s3/cli_app.py— Main entry point, execution loopgui_agents/s3/agents/— Each agent role (planner, executor, grounding)gui_agents/s3/core/— Screenshot capture, action execution, state managementgui_agents/s3/bbon/— Behavior Best-of-N implementationgui_agents/s1/(briefly) — Compare S1 architecture to S3 to see evolution
Explore the papers' techniques:
-
How does "experience-augmented hierarchical planning" work? (S1)
-
What's the "Mixture of Grounding" approach? (S2)
-
How does S3 achieve simplicity while improving performance?
-
📝 Homework: Write "What I'd Steal from Agent-S" at
~/agent-study/notes/week3-steal.md- Focus on: The screenshot→grounding→action pipeline, bBoN strategy, cross-platform abstractions
- Think about: Could you add computer-use capabilities to a Pydantic-AI agent as a tool?
Day 5 (Friday): Integration Project + Reflection
-
Build a mini-project:
- Suggested: A "screen monitoring" agent that periodically screenshots your desktop, uses a vision model to understand what's happening, and logs structured summaries (using Pydantic-AI for the structured output!)
- Alternative: Build a browser automation agent using Playwright + vision model (a simplified version of Agent-S's approach)
- Save at
~/agent-study/week3-agent-s/integration_project/
-
Write retrospective at
~/agent-study/notes/week3-retro.md -
Update comparison matrix
🎯 Key Questions:
- What is the screenshot → grounding → action pipeline and why is it powerful?
- Why does Agent-S need a separate grounding model (UI-TARS) in addition to the planning LLM?
- What is Behavior Best-of-N and how does it improve performance by ~6%?
- How is computer-use fundamentally different from API-based agent frameworks?
- What are the security implications of an agent that can control your mouse and keyboard?
- What's the difference between Agent-S's approach and Anthropic's Computer Use or OpenAI's Operator?
- When would you use computer-use agents vs. API-based agents? Give 3 examples of each.
Week 4: GPT Researcher
Difficulty: ⭐⭐ (Straightforward architecture, well-documented, familiar patterns) Repo: github.com/assafelovic/gpt-researcher Stars: 25k | Language: Python
Why This Is Week 4
After 3 weeks of studying how agents work internally, this week is about studying a complete, purpose-built agent that does one thing extremely well: research. GPT Researcher is the best example of the "Plan-and-Solve + RAG" pattern — a design you'll reuse in your own projects.
Resources
| Resource | Link |
|---|---|
| 📖 Documentation | docs.gptr.dev |
| 💬 Discord | Discord |
| 📦 PyPI | gpt-researcher |
| 📝 Blog: How it was built | docs.gptr.dev/blog |
| 🎥 Demo | YouTube |
| 🔧 MCP Integration | MCP Guide |
| 📜 Plan-and-Solve Paper | arxiv.org/abs/2305.04091 |
🗂 Source Code Guide
gpt_researcher/
├── agent.py # ⭐ THE file. GPTResearcher class — the entire research orchestration
├── actions/ # ⭐ Research actions (generate questions, search, scrape, synthesize)
│ ├── query_processing.py # How research questions are generated from the user query
│ ├── web_search.py # Web search execution
│ └── report_generation.py # Final report synthesis
├── config/ # Configuration management
│ └── config.py # All configurable parameters
├── context/ # ⭐ Context management — how gathered info is stored/retrieved
│ └── compression.py # How context is compressed to fit token limits
├── document/ # Document processing (PDF, web pages, etc.)
├── memory/ # ⭐ Research memory — how the agent remembers what it's found
├── orchestrator/ # ⭐ Deep research — recursive tree exploration
│ └── agent/ # Sub-agents for deep research mode
├── retrievers/ # ⭐ Web/local search implementations (Tavily, DuckDuckGo, MCP, etc.)
└── scraper/ # Web scraping implementations
💡 Tip:
agent.pyis the heart. It's one file, ~700 lines, and it contains the entire research orchestration. Read it top to bottom. Then readactions/to understand each step.
Day 1 (Monday): Architecture Deep Dive
Read:
- Full README
- How it was built — the design blog post
- Getting Started
- Customization docs
Understand the Plan-and-Solve architecture:
User Query
→ Planner Agent: Generate N research questions
→ For each question:
→ Crawler Agent: Search web, gather sources
→ Summarizer: Extract relevant info from each source
→ Source tracker: Track citations
→ Publisher Agent: Aggregate all findings into a report
Deep Research mode adds recursion:
User Query → Generate sub-topics → For each sub-topic → Generate deeper sub-topics → ... → Aggregate bottom-up
- 📝 Homework: Write architecture summary at
~/agent-study/notes/week4-architecture.md
Day 2 (Tuesday): Hello World + Core Concepts
Setup:
cd ~/agent-study/week4-gpt-researcher
python -m venv .venv && source .venv/bin/activate
pip install gpt-researcher
# Set required API keys
export OPENAI_API_KEY=<your-key>
export TAVILY_API_KEY=<your-key>
Run the simplest version:
from gpt_researcher import GPTResearcher
import asyncio
async def main():
query = "What are the latest advancements in AI agent frameworks in 2025-2026?"
researcher = GPTResearcher(query=query)
research_result = await researcher.conduct_research()
report = await researcher.write_report()
print(report)
asyncio.run(main())
Also try the web UI:
git clone https://github.com/assafelovic/gpt-researcher.git
cd gpt-researcher
pip install -r requirements.txt
python -m uvicorn main:app --reload
# Visit http://localhost:8000
- 📝 Homework: Build a minimal research agent from scratch — NO copy-paste
- Save at
~/agent-study/week4-gpt-researcher/hello_researcher.py
- Save at
Day 3 (Wednesday): Intermediate Build — Deep Research + MCP
Focus: GPT Researcher's key differentiators — Deep Research mode and MCP integration
Work through:
- Deep Research docs
- MCP Integration Guide
- Local document research
- Run a Deep Research query and observe the recursive tree exploration
Key concepts:
-
How Deep Research recursively explores sub-topics
-
How MCP connects GPT Researcher to external data sources
-
How context compression prevents token limit issues
-
How source tracking and citations work
-
The difference between web research and local document research
-
📝 Homework: Build a research agent that uses GPT Researcher's unique capabilities:
- Must include: MCP integration with at least one external source (e.g., GitHub MCP server)
- OR: Research over local documents (PDFs, markdown files from your study notes)
- Bonus: Use Deep Research mode for a complex topic
- Save at
~/agent-study/week4-gpt-researcher/deep_research_demo.py
Day 4 (Thursday): Advanced Patterns + Source Code Reading
Read these source files (in order):
gpt_researcher/agent.py— The entire GPTResearcher class, top to bottomgpt_researcher/actions/query_processing.py— How research questions are generatedgpt_researcher/context/compression.py— How context is managed within token limitsgpt_researcher/orchestrator/— Deep research recursive tree implementationgpt_researcher/retrievers/— How different search providers are integrated
Understand:
-
How the planner decomposes a query into research questions
-
How the agent handles rate limiting and API failures
-
How context compression works (this is critical for long research)
-
How the orchestrator manages the recursive tree in Deep Research mode
-
How the report generator synthesizes multiple sources into a coherent report
-
📝 Homework: Write "What I'd Steal from GPT Researcher" at
~/agent-study/notes/week4-steal.md- Focus on: Plan-and-Solve decomposition, context compression, source tracking, recursive exploration
- Compare: how would you build "deep research" capability into a Pydantic-AI agent?
Day 5 (Friday): Integration Project + Reflection
-
Build a mini-project:
- Suggested: A "competitive analysis" agent — given a company/product, it researches competitors, pricing, features, and generates a structured comparison report. Use GPT Researcher's engine + Pydantic-AI for structured output.
- Alternative: Install GPT Researcher as a Claude Skill and use it in your Claude workflow
- Save at
~/agent-study/week4-gpt-researcher/integration_project/
-
Write retrospective at
~/agent-study/notes/week4-retro.md -
Update comparison matrix
🎯 Key Questions:
- What is the Plan-and-Solve pattern and how does GPT Researcher implement it?
- How does Deep Research differ from regular research? Draw the tree structure.
- How does context compression prevent token limit issues during long research?
- How does GPT Researcher track and cite sources?
- What search providers does GPT Researcher support and how do you add a new one?
- How could you combine GPT Researcher with Pydantic-AI for structured research outputs?
- What are the limitations of automated research (hallucination, bias, recency)?
Week 5: Yao
Difficulty: ⭐⭐⭐⭐ (Go language, novel architecture, less documentation, paradigm shift) Repo: github.com/YaoApp/yao Stars: 7.5k | Language: Go | Runtime: Single binary with V8 engine
Why This Is Week 5
Yao is the most architecturally unique repo in the entire study. It's not a chatbot framework — it's an autonomous agent engine where agents are triggered by events, schedules, and emails. This is the only Go-based framework, the only one with event-driven architecture, and the only one that deploys as a single binary. If everything else is "AI assistant," Yao is "AI team member."
⚠️ Language Note: This week requires Go. If you don't know Go, spend an extra hour on Day 1 doing the Go Tour. You don't need to be fluent — just enough to read the source code.
Resources
| Resource | Link |
|---|---|
| 🏠 Homepage | yaoapps.com |
| 📖 Documentation | yaoapps.com/docs |
| 🚀 Quick Start | Getting Started |
| ✨ Why Yao? | Why Yao |
| 🤖 Agent Examples | YaoAgents/awesome |
| 📦 Install Script | curl -fsSL https://yaoapps.com/install.sh | bash |
| 🐹 Go Tour (if needed) | go.dev/tour |
🗂 Source Code Guide
yao/
├── engine/
│ └── process.go # ⭐ Process engine — core concept in Yao
├── agent/ # ⭐ Agent framework — autonomous agent definitions
│ ├── agent.go # Agent lifecycle, trigger modes, execution phases
│ └── triggers/ # Clock, Human, Event trigger implementations
├── runtime/
│ └── v8/ # ⭐ Built-in V8 JavaScript/TypeScript engine
├── rag/
│ └── graph/ # ⭐ Built-in GraphRAG implementation
├── mcp/ # MCP integration
├── api/ # HTTP server and REST API
├── model/ # ORM and database layer
└── cmd/
└── yao/
└── main.go # Application entry point
💡 Tip: Yao's DSL-based approach means you'll be reading
.yaofiles (YAML-like definitions) as much as Go source code. The mental model is: you define agents as data (DSL), and the engine executes them.
Day 1 (Monday): Architecture Deep Dive
Read:
- Full README
- Why Yao?
- Documentation overview
- Skim the Go source:
cmd/yao/main.go→engine/process.go→agent/agent.go
Understand Yao's radical differences:
| Traditional Agent | Yao Agent |
|---|---|
| Entry point: chatbox | Entry point: email, events, schedules |
| Passive: you ask, it answers | Proactive: it works autonomously |
| Role: tool | Role: team member |
The six-phase execution model:
Inspiration → Goals → Tasks → Run → Deliver → Learn
Three trigger modes:
- Clock — scheduled tasks (cron-like)
- Human — triggered by email or messages
- Event — triggered by webhooks or database changes
- 📝 Homework: Write architecture summary at
~/agent-study/notes/week5-architecture.md- Focus on: How the event-driven model is fundamentally different from request-response
- Compare: 6-phase execution vs Pydantic-AI's run loop vs MS Agent Framework's graph
Day 2 (Tuesday): Hello World + Core Concepts
Setup:
# Install Yao (single binary!)
curl -fsSL https://yaoapps.com/install.sh | bash
# Create a project
cd ~/agent-study/week5-yao
mkdir project && cd project
yao start # First run creates project structure
# Visit http://127.0.0.1:5099
Run your first process:
yao run utils.app.Ping # Returns version
yao run scripts.tests.Hello 'Hello, Yao!' # Run TypeScript
yao run models.tests.pet.Find 1 '::{}' # Query database
Understand core concepts:
-
Processes — functions that can be run directly or referenced in code
-
Models — database models defined in
.mod.yaofiles -
Scripts — TypeScript/JavaScript code executed by the built-in V8 engine
-
DSL — Yao's declarative syntax for defining everything
-
📝 Homework: Build the simplest Yao application from scratch:
- Define a model, write a process, create a simple API endpoint
- Save project at
~/agent-study/week5-yao/hello_project/
Day 3 (Wednesday): Intermediate Build — Event-Driven Agents
Focus: What makes Yao unique — event-driven, proactive agents
Work through:
- Agent configuration — defining agents with roles and triggers
- Setting up a scheduled (Clock) trigger
- Setting up an Event trigger (webhook → agent action)
- MCP integration — connecting external tools
- GraphRAG — how the built-in knowledge graph works
Key concepts:
-
How agents are defined declaratively (vs. programmatically in Python frameworks)
-
How the three trigger modes work in practice
-
How agents learn from past executions (the "Learn" phase)
-
How GraphRAG combines vector search with graph traversal
-
Why a single binary matters for deployment
-
📝 Homework: Build an event-driven agent:
- Must include: At least 2 different trigger modes (e.g., Clock + Event)
- Must include: An agent that does something proactively (not just responding to a chat)
- Example idea: An agent that checks an RSS feed on a schedule (Clock), processes new articles (Run), and stores summaries in the knowledge base (Learn/Deliver)
- Save at
~/agent-study/week5-yao/event_agent/
Day 4 (Thursday): Advanced Patterns + Source Code Reading
Read these source files (in order):
cmd/yao/main.go— Application entry point, how the single binary initializesengine/process.go— The process engine (core execution abstraction)agent/agent.go— Agent lifecycle and execution phasesruntime/v8/— How the V8 engine is embedded for TypeScript supportrag/graph/— GraphRAG implementation (vector + graph hybrid search)
Understand:
-
How Go's concurrency model (goroutines) enables event-driven agents
-
How the V8 engine is embedded and used for TypeScript execution
-
How GraphRAG combines embedding search with entity-relationship traversal
-
How a single Go binary includes all these features without external dependencies
-
📝 Homework: Write "What I'd Steal from Yao" at
~/agent-study/notes/week5-steal.md- Focus on: Event-driven architecture, single binary deployment, GraphRAG, DSL approach
- Think about: Could you add event-driven capabilities to a Python agent framework?
Day 5 (Friday): Integration Project + Reflection
-
Build a mini-project:
- Suggested: A "daily briefing" agent — schedule it to run every morning, have it gather data from APIs (weather, calendar, news), process it, and output a structured briefing. Use the Clock trigger + MCP for external data.
- Alternative: Build a webhook-triggered agent that processes incoming data and stores it in GraphRAG
- Save at
~/agent-study/week5-yao/integration_project/
-
Write retrospective at
~/agent-study/notes/week5-retro.md -
Update comparison matrix
🎯 Key Questions:
- How does Yao's event-driven model differ from the request-response model of every other framework?
- What are the three trigger modes and when would you use each?
- What is the six-phase execution model and how does the "Learn" phase create a feedback loop?
- Why is single-binary deployment a significant advantage? Where would you deploy Yao that you couldn't deploy Python frameworks?
- How does Yao's built-in GraphRAG differ from vector-only RAG?
- What does it mean that Yao embeds a V8 engine? What are the implications for extensibility?
- What types of applications is Yao best suited for vs. worst suited for?
Week 6: MetaGPT
Difficulty: ⭐⭐⭐ (Large codebase, academic concepts, multi-agent complexity) Repo: github.com/FoundationAgents/MetaGPT Stars: 63k | Language: Python | Papers: ICLR 2024 + many more
Why This Is Week 6
MetaGPT is the OG multi-agent framework and the capstone of your study. It introduces Standard Operating Procedures (SOPs) as the coordination mechanism — a genuinely novel idea that maps human organizational structures onto AI agents. By Week 6, you have enough context from the previous 5 frameworks to deeply appreciate what MetaGPT does differently.
Resources
| Resource | Link |
|---|---|
| 📖 Documentation | docs.deepwisdom.ai |
| 💬 Discord | Discord |
| 📦 PyPI | metagpt |
| 🎯 MGX (commercial product) | mgx.dev |
| 📄 MetaGPT Paper (ICLR 2024) | openreview.net |
| 📄 AFlow Paper (ICLR 2025 Oral) | openreview.net |
| 📝 Agent 101 Tutorial | Agent 101 |
| 📝 MultiAgent 101 | MultiAgent 101 |
| 🤗 HuggingFace Demo | MetaGPT Space |
🗂 Source Code Guide
metagpt/
├── roles/ # ⭐ Role definitions — each role = one agent with a job
│ ├── role.py # ⭐ Base Role class — THE core abstraction
│ ├── architect.py # Software architect agent
│ ├── engineer.py # Software engineer agent
│ ├── product_manager.py # Product manager agent
│ ├── project_manager.py # Project manager agent
│ └── di/
│ └── data_interpreter.py # Data analysis agent
├── actions/ # ⭐ Action definitions — what roles can do
│ ├── action.py # Base Action class
│ ├── write_prd.py # Write Product Requirements Document
│ ├── write_design.py # Write system design
│ └── write_code.py # Write code
├── team.py # ⭐ Team orchestration — how roles collaborate via SOPs
├── environment.py # ⭐ Shared environment — message passing between roles
├── schema.py # Message schemas for inter-role communication
├── config2.py # Configuration management
├── base/ # Base classes and utilities
├── memory/ # Memory management for roles
├── software_company.py # ⭐ The "software company" end-to-end pipeline
└── utils/
└── project_repo.py # Project repository management
💡 Tip: The mental model is: Role (who) performs Actions (what) according to SOPs (how). Read
roles/role.pyfirst, thenactions/action.py, thenteam.py. That's the holy trinity of MetaGPT.
Day 1 (Monday): Architecture Deep Dive
Read:
- Full README
- Agent 101 Tutorial
- MultiAgent 101 Tutorial
- MetaGPT paper (abstract + Sections 1-3) — the SOP concept
- Skim the AFlow paper abstract — automated workflow generation
Core philosophy: Code = SOP(Team)
Identify core abstractions:
- Role — an agent with a specific job (PM, architect, engineer, etc.)
- Action — a discrete task a role can perform (write PRD, write code, etc.)
- SOP — Standard Operating Procedures that define the workflow between roles
- Team — the orchestrator that manages roles and message passing
- Environment — shared context where roles publish and subscribe to messages
- Message — typed communication between roles
The "software company" pipeline:
User Requirement
→ Product Manager (writes PRD)
→ Architect (writes system design)
→ Project Manager (creates task breakdown)
→ Engineer (writes code)
→ QA (tests code)
- 📝 Homework: Write architecture summary at
~/agent-study/notes/week6-architecture.md- Explain the SOP model and how it maps to human organizations
- Compare: SOP coordination vs Graph workflows (MS) vs Event-driven (Yao) vs Linear (Pydantic-AI)
Day 2 (Tuesday): Hello World + Core Concepts
Setup:
cd ~/agent-study/week6-metagpt
conda create -n metagpt python=3.11 && conda activate metagpt
pip install --upgrade metagpt
metagpt --init-config # Creates ~/.metagpt/config2.yaml
# Edit the config to add your API key
Run the classic demo:
metagpt "Create a snake game" # This will generate a full project in ./workspace
Also try programmatically:
from metagpt.software_company import generate_repo
from metagpt.utils.project_repo import ProjectRepo
repo: ProjectRepo = generate_repo("Create a simple calculator app")
print(repo)
And try the Data Interpreter:
import asyncio
from metagpt.roles.di.data_interpreter import DataInterpreter
async def main():
di = DataInterpreter()
await di.run("Run data analysis on sklearn Iris dataset, include a plot")
asyncio.run(main())
- 📝 Homework: Build a custom role from scratch — NO copy-paste:
- Define a new
Rolesubclass with customActions - Example: a "ResearchAnalyst" role that takes a topic and produces a structured analysis
- Save at
~/agent-study/week6-metagpt/hello_role.py
- Define a new
Day 3 (Wednesday): Intermediate Build — Multi-Agent SOPs
Focus: MetaGPT's unique capability — SOP-based multi-agent coordination
Work through:
- MultiAgent 101
- Look at the Debate example
- Understand how messages flow between roles via the Environment
- Understand how the SOP defines which role acts after which
Key concepts:
-
How roles subscribe to message types from other roles
-
How the Team orchestrator manages turn-taking
-
How the Environment enables publish/subscribe communication
-
How SOPs encode workflow logic without explicit graph definitions
-
The difference between the "software company" SOP and custom SOPs
-
📝 Homework: Build a multi-agent system with a custom SOP:
- Must include: At least 3 custom roles with different responsibilities
- Must include: Custom message types between roles
- Must include: A clear SOP workflow (Role A → Role B → Role C)
- Example idea: A "content creation team" — Researcher (gathers info) → Writer (drafts article) → Editor (reviews and improves) → Publisher (formats final output)
- Save at
~/agent-study/week6-metagpt/multi_agent_sop.py
Day 4 (Thursday): Advanced Patterns + Source Code Reading
Read these source files (in order):
metagpt/roles/role.py— Base Role class, how roles think and actmetagpt/actions/action.py— Base Action class, how actions executemetagpt/team.py— Team orchestration, turn managementmetagpt/environment.py— Message passing, pub/sub systemmetagpt/schema.py— Message types and schemas
Also explore:
metagpt/roles/engineer.py— how the Engineer role writes code (complex action chain)metagpt/software_company.py— the end-to-end pipelinemetagpt/memory/— how roles maintain memory across turnsexamples/— AFlow and SPO implementations
Advanced concepts:
-
How does AFlow (Automated Agentic Workflow Generation) work?
-
What is SPO (Self-Play Optimization)?
-
How does the Data Interpreter differ from the Software Company pipeline?
-
📝 Homework: Write "What I'd Steal from MetaGPT" at
~/agent-study/notes/week6-steal.md- Focus on: SOP-based coordination, Role/Action abstraction, message-passing environment
- Reflect on: Which coordination model do you prefer? Graph (MS) vs SOP (MetaGPT) vs Event (Yao)?
Day 5 (Friday): Integration Project + Final Reflection
-
Build a mini-project:
- Suggested: A multi-agent system that takes a business idea and produces a full analysis: Market Researcher role → Business Analyst role → Financial Modeler role → Report Writer role. Each produces a structured output that feeds into the next.
- Save at
~/agent-study/week6-metagpt/integration_project/
-
Write final retrospective at
~/agent-study/notes/week6-retro.md- This one should be more comprehensive — reflect on ALL 6 weeks
- What framework would you reach for first? When?
- What surprised you most across the study?
-
Complete comparison matrix — all 6 frameworks
-
Commit and push everything to your study git repo
🎯 Key Questions:
- What does "Code = SOP(Team)" mean concretely?
- How does the Role/Action/SOP model map to real organizational structures?
- How do messages flow between roles? What's the pub/sub mechanism?
- What's the difference between MetaGPT's approach and MS Agent Framework's graph workflows?
- How does the Data Interpreter feature differ from the Software Company pipeline?
- What is AFlow and why was it accepted as an oral presentation at ICLR 2025?
- When would you use MetaGPT vs simpler single-agent frameworks?
- Across all 6 frameworks, which coordination model (linear/graph/SOP/event) do you think is most general?
Week 7: ElizaOS
Timeline: 1 week | Difficulty: ⭐⭐ | Goal: Learn agent deployment & multi-platform distribution Repo: elizaOS/eliza | ⭐ 17,476 | TypeScript Why this week: Weeks 1-6 taught you how to BUILD agents. This week teaches you how to DEPLOY them where users actually are.
Why ElizaOS Makes The Cut
After a thorough debate (see the deep dive analysis), ElizaOS earned its spot because:
- It's the only deployment-focused platform on the trending list — multi-platform routing (Discord, Telegram, Twitter, Farcaster) in one framework
- 17k stars with active development and a large community
- The plugin architecture, character system, and platform adapters teach real deployment patterns you won't learn from any other framework studied
- Knowing how to ship agents to where users live is as important as knowing how to build them
Resources
| Resource | URL |
|---|---|
| GitHub | https://github.com/elizaOS/eliza |
| Docs | https://elizaos.github.io/eliza/ |
| Discord | https://discord.gg/elizaos |
| Quickstart | https://elizaos.github.io/eliza/docs/quickstart |
Key Source Files to Read
| File | Why It Matters |
|---|---|
packages/core/src/runtime.ts |
The AgentRuntime — the central brain that coordinates everything |
packages/core/src/types.ts |
All the core interfaces (Character, Memory, Action, Provider, Evaluator) |
packages/plugin-discord/src/index.ts |
How a platform adapter is built — the Discord integration |
packages/plugin-telegram/src/index.ts |
Compare with Discord adapter — spot the platform abstraction pattern |
packages/core/src/memory.ts |
Memory management — how agents maintain context across platforms |
agent/src/index.ts |
The entry point — how everything gets wired together |
Day 1 (Monday): Architecture Deep Dive — The Deployment Platform
Study (1-2 hrs):
- Read the full README and quickstart docs
- Understand the core architecture:
- Character files — how agent personalities are defined (JSON-based)
- AgentRuntime — the central coordinator
- Plugins — how platform adapters, actions, and providers are registered
- Actions vs Evaluators vs Providers — the three extension points
- Memory — how conversation state persists across platforms
- Study the plugin system architecture — how does one agent connect to Discord AND Telegram simultaneously?
- Understand the character file format — what can you configure?
Key Questions:
- How does ElizaOS route a message from Discord to the right agent and back?
- What's the difference between an Action, an Evaluator, and a Provider?
- How does the memory system work across platforms? Can an agent remember a Discord convo when talking on Telegram?
- How does the character file influence agent behavior vs hard-coded logic?
Homework:
- Write a 1-page architecture summary covering: runtime → plugins → adapters → memory → character system
- Draw a diagram showing message flow: User sends Discord message → ... → Agent responds
- Compare the architecture to Pydantic-AI's approach — what's different about a "deployment-first" vs "logic-first" framework?
Day 2 (Tuesday): Hello World — Deploy an Agent to Discord
Study (1-2 hrs):
- Set up the ElizaOS development environment
- Clone the repo, install deps (
pnpm install) - Create a Discord bot in the Discord Developer Portal (you'll need a test server)
- Set up your
.envwith Discord bot token and an LLM API key
- Clone the repo, install deps (
- Create a custom character file for your agent:
- Define name, bio, personality traits, example conversations
- Set the model provider and platform connections
- Run the agent locally, verify it responds in Discord
Homework:
- Create a character file from scratch (no copy-paste from examples) — give it a distinct personality
- Deploy the agent to your Discord test server and have a 10-message conversation with it
- Screenshot the conversation and note: What worked? What felt off? How does character configuration affect responses?
Day 3 (Wednesday): Multi-Platform + Plugin System
Study (1-2 hrs):
- Add a second platform — connect the same agent to Telegram (or Twitter)
- Same character, same agent, two platforms simultaneously
- Observe: does memory carry across? How does the agent handle platform-specific features?
- Study the plugin architecture:
- Read how
plugin-discordandplugin-telegramare structured - Understand the
Plugininterface — what does a plugin provide? - Look at how Actions work — these are the agent's "tools"
- Read how
- Write a custom Action plugin:
- Something simple: a weather lookup, a file reader, or a joke generator
- Register it and verify your agent can use it on both platforms
Homework:
- Run your agent on 2 platforms simultaneously — screenshot both conversations
- Build a custom Action plugin from scratch and verify it works
- Write a comparison: how does ElizaOS's plugin system compare to Pydantic-AI's tool system and MetaGPT's action system? What are the trade-offs?
Day 4 (Thursday): Source Code Reading + Advanced Patterns
Study (1-2 hrs):
- Read the key source files from the table above, focusing on:
- runtime.ts — How does the AgentRuntime process an incoming message? What's the evaluation pipeline?
- types.ts — What are all the interfaces? How extensible is the system?
- memory.ts — How is conversation history stored and retrieved? What's the embedding strategy?
- Study advanced patterns:
- Multi-agent setups — can you run multiple agents with different characters?
- Custom evaluators — how do you add post-processing logic?
- Custom providers — how do you inject context into every agent response?
- Compare deployment architecture decisions:
- How does ElizaOS handle rate limiting across platforms?
- How does it handle platform-specific message formatting (embeds, buttons, etc.)?
- What's the error handling strategy when a platform adapter fails?
Homework:
- Write a "What I'd Steal From ElizaOS" doc — which patterns are worth using in your own projects? Think:
- Character file abstraction for agent personality
- Plugin registration pattern
- Platform adapter interface
- Memory routing across services
- Identify the 3 biggest architectural weaknesses (every framework has them)
Day 5 (Friday): Integration Project — Deploy a Week 1-6 Agent
The real test: Take an agent you built in Weeks 1-6 and deploy it to at least one chat platform using patterns learned from ElizaOS.
Options (pick one):
- Pydantic-AI agent → Discord: Take your structured-output agent from Week 1 and wrap it in a Discord bot using ElizaOS's adapter patterns (or build your own minimal adapter inspired by their architecture)
- GPT Researcher → Telegram: Take your research agent from Week 4 and make it accessible via Telegram — users send a topic, agent researches and responds
- Multi-framework pipeline → Discord: Take your Week 6 MetaGPT multi-agent setup and expose it through a Discord interface where users can kick off the SOP workflow
Homework:
- Deploy a previously-built agent to a real chat platform — it must respond to real messages
- Write a retrospective for ElizaOS:
- Strengths: What does it do better than building your own deployment layer?
- Weaknesses: Where is it limited or frustrating?
- When to use: What type of project benefits most from ElizaOS?
- When to skip: When is it overkill or the wrong tool?
- Update the comparison matrix with the ElizaOS column
- Answer: "If I were building a production agent for a client, would I use ElizaOS for deployment or roll my own? Why?"
Key Questions You Should Be Able to Answer After Week 7
- How does ElizaOS's character system differ from hardcoding agent personalities?
- What's the plugin registration lifecycle — from
Plugindefinition to runtime availability? - How would you add a completely new platform (e.g., Slack, WhatsApp) to ElizaOS?
- What are the trade-offs of a deployment-platform approach vs building bespoke platform integrations?
- How does multi-platform memory work — and where does it break down?
- When is ElizaOS the right choice vs a simple Discord.js bot?
- What deployment patterns from ElizaOS would you steal for a custom agent pipeline?
Week 8: Capstone Project
Timeline: 1 week | Difficulty: ⭐⭐⭐⭐⭐ | Goal: Synthesize learnings from 3+ frameworks
The Project: "Research → Analyze → Act" Pipeline
Build a system that combines at least 3 of the frameworks you studied:
Recommended Architecture
┌─────────────────────────────────────────────────────────┐
│ Capstone Pipeline │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌────────────┐ │
│ │ GPT │ │ Pydantic-AI │ │ MetaGPT OR │ │
│ │ Researcher │───▶│ Structured │───▶│ MS Agent │ │
│ │ (Research) │ │ Analysis │ │ Framework │ │
│ │ │ │ Agent │ │ (Execute) │ │
│ └──────────────┘ └──────────────┘ └────────────┘ │
│ │
│ Optional additions: │
│ - Agent-S for browser automation during research │
│ - Yao for scheduling periodic re-research │
└─────────────────────────────────────────────────────────┘
Requirements
- Stage 1: Research — Use GPT Researcher to conduct deep research on a topic
- Stage 2: Analysis — Use Pydantic-AI to process research into structured data with validated output types
- Stage 3: Action — Use MetaGPT's multi-agent SOP OR MS Agent Framework's graph workflow to generate deliverables from the structured analysis
- Integration: The output of one stage must be the input to the next
- Documentation: Write a README explaining your architecture and design decisions
Stretch Goals
- Add a Yao scheduled trigger so the pipeline runs daily/weekly
- Deploy the entire pipeline to Discord/Telegram using ElizaOS patterns from Week 7
- Add observability (Logfire or OpenTelemetry)
- Add a web UI (even simple HTML)
- Use MCP to connect components
- Add Agent-S for any browser automation steps
Deliverables
- Working code at
~/agent-study/capstone/ README.mdwith architecture diagram and setup instructionsDECISIONS.mdexplaining why you chose each framework for each stageRETROSPECTIVE.md— final thoughts on the 7-week journey
Suggested Topics for the Pipeline
- Competitor Analysis Tool — Research competitors → Structure findings → Generate strategic recommendations
- Daily News Briefing — Research trending topics → Analyze relevance → Generate personalized newsletter
- Technical Due Diligence — Research a technology → Structured pros/cons → Multi-perspective report (architect, PM, engineer roles)
- Market Research Report — Research a market → Structured data extraction → Executive summary + detailed report
Appendix: Comparison Matrix Template
Save this at ~/agent-study/comparison-matrix/matrix.md and fill it in weekly:
# AI Agent Framework Comparison Matrix
| Dimension | Pydantic-AI | MS Agent Framework | Agent-S | GPT Researcher | Yao | MetaGPT | ElizaOS |
|-----------|-------------|-------------------|---------|----------------|-----|---------|---------|
| **Language** | Python | Python + .NET | Python | Python | Go | Python | TypeScript |
| **Stars** | 14.6k | 7k | 9.6k | 25k | 7.5k | 63k | 17k |
| **Agent Definition** | | | | | | | |
| **Tool Integration** | | | | | | | |
| **Multi-Agent Coord.** | | | | | | | |
| **Error Handling** | | | | | | | |
| **Observability** | | | | | | | |
| **Type Safety** | | | | | | | |
| **DX / Ergonomics** | | | | | | | |
| **Production Readiness** | | | | | | | |
| **Unique Superpower** | | | | | | | |
| **Biggest Weakness** | | | | | | | |
| **Best Use Case** | | | | | | | |
| **Would I Use For...** | | | | | | | |
| **Overall Rating (1-10)** | | | | | | | |
📊 Week-by-Week Schedule Overview
| Week | Framework | Focus | Difficulty | Key Deliverables |
|---|---|---|---|---|
| 0 | Prep | Setup & background reading | ⭐ | Environment ready, papers skimmed |
| 1 | Pydantic-AI | Type-safe agents, DI, structured output | ⭐⭐ | Architecture doc, 3 agents, steal doc |
| 2 | MS Agent Framework | Graph workflows, DevUI, enterprise patterns | ⭐⭐⭐ | Graph workflow, DevUI screenshots, steal doc |
| 3 | Agent-S | Computer use, visual grounding, screenshots | ⭐⭐⭐⭐ | Computer use demo, architecture analysis |
| 4 | GPT Researcher | Deep research, Plan-and-Solve, RAG | ⭐⭐ | Research agent, MCP integration |
| 5 | Yao | Event-driven agents, Go, single binary, GraphRAG | ⭐⭐⭐⭐ | Event-driven agent, DSL exploration |
| 6 | MetaGPT | SOPs, multi-agent teams, roles/actions | ⭐⭐⭐ | Multi-agent SOP, comparison matrix |
| 7 | ElizaOS | Deployment, multi-platform distribution, plugins | ⭐⭐ | Multi-platform agent, custom plugin, deploy a Week 1-6 agent |
| 8 | Capstone | Integrate 3+ frameworks | ⭐⭐⭐⭐⭐ | Working pipeline, docs, retrospective |
🏁 Success Criteria
After completing this study plan, you should be able to:
- Explain the architecture of each framework from memory (whiteboard test)
- Build a production-grade agent with Pydantic-AI from scratch
- Design a graph workflow for a complex multi-step process
- Understand computer-use agent architecture and its limitations
- Implement a Plan-and-Solve research pipeline
- Compare event-driven vs request-response agent architectures
- Deploy an agent to Discord/Telegram and understand multi-platform routing patterns
- Choose the right framework for a given problem with clear reasoning
- Read any agent framework's source code and quickly identify its core abstractions
"The goal isn't to memorize APIs. It's to build intuition for how agent systems are designed, so you can build your own or extend existing ones with confidence."
Generated by Clawdbot | February 4, 2026