Jake Shore a2c95437c1 Daily backup: 2026-02-04

2026-02-04 23:01:37 -05:00

71 KiB

Raw Blame History

🧠 AI Agent Frameworks — 8-Week Deep Study Plan

Goal: Go from "I've heard of these" to "I could build & deploy production systems with these" in 8 weeks. Time commitment: ~1-2 hours/day, Mon-Fri Based on: Trending Repos Deep Dive Analysis (Feb 2026) Last updated: February 4, 2026

📋 Table of Contents

Week 0: Prep & Prerequisites
Week 1: Pydantic-AI — The Production SDK ⭐⭐
Week 2: Microsoft Agent Framework — Enterprise Orchestration ⭐⭐⭐
Week 3: Agent-S — Computer Use Pioneer ⭐⭐⭐⭐
Week 4: GPT Researcher — Deep Research Agent ⭐⭐
Week 5: Yao — Event-Driven Agents in Go ⭐⭐⭐⭐
Week 6: MetaGPT — Multi-Agent SOP Framework ⭐⭐⭐
Week 7: ElizaOS — Deployment & Multi-Platform Distribution ⭐⭐
Week 8: Capstone Project
Appendix: Comparison Matrix Template

⭐ = Difficulty Rating (1-5). More stars = harder week.

Week 0: Prep & Prerequisites

Timeline: The weekend before you start. ~3-4 hours total.

Environment Setup

Python 3.11+ installed (python --version)
Go 1.21+ installed for Week 5 (go version)
Node.js 18+ and pnpm installed (needed for MetaGPT and Yao)
Docker Desktop installed and running
Git configured with SSH keys for cloning repos
VS Code (or your editor) with Python + Go extensions
A GPU or cloud GPU access (optional, helps for Agent-S grounding model)

API Keys & Accounts

OpenAI API key — used by almost every framework
Anthropic API key — primary for Pydantic-AI examples
Tavily API key — required for GPT Researcher (free tier works: app.tavily.com)
Azure OpenAI access — needed for Microsoft Agent Framework (free trial available)
Hugging Face account + token — needed for Agent-S grounding model
Google API key — optional, for Gemini-based features in GPT Researcher

Workspace Setup

# Create a clean workspace for all 6 weeks
mkdir -p ~/agent-study/{week1-pydantic-ai,week2-ms-agent,week3-agent-s,week4-gpt-researcher,week5-yao,week6-metagpt,capstone}
mkdir -p ~/agent-study/notes
mkdir -p ~/agent-study/comparison-matrix

# Initialize a git repo for your study notes
cd ~/agent-study
git init
echo "# AI Agent Frameworks Study" > README.md
git add . && git commit -m "init study workspace"

Background Reading (1-2 hours)

Read these before Week 1. They're the conceptual foundation:

Plan-and-Solve Prompting — The paper behind GPT Researcher's architecture. Skim the abstract + Section 3.
RAG paper — Core concept used by multiple frameworks. Read abstract + intro.
Model Context Protocol (MCP) spec — Anthropic's protocol for tool integration. Read the overview page.
Agent2Agent (A2A) protocol — Google's agent interop standard. Skim the spec overview.
Pydantic docs (crash course) — If you're rusty on Pydantic, spend 30 min here. It's the foundation of Week 1.

Mental Model to Build

Every agent framework answers the same 5 questions differently:

How do you define an agent? (class, function, config, DSL)
How do agents use tools? (function calling, MCP, code execution)
How do multiple agents coordinate? (graph, SOP, message passing, events)
How do you handle errors & retries? (automatic, manual, durable execution)
How do you observe what happened? (logging, tracing, replay)

Keep these questions in mind every week. By Week 6, you'll have 6 different answers for each.

Week 1: Pydantic-AI

Difficulty: ⭐⭐ (Approachable — excellent docs, familiar Python patterns) Repo: github.com/pydantic/pydantic-ai Stars: 14.6k | Language: Python | Version: v1.52.0+

Why This Is Week 1

Pydantic-AI is the most ergonomic agent framework and has the best docs. Starting here builds your mental model for how agent SDKs should feel. Everything after this week will be compared to Pydantic-AI's developer experience. It's the FastAPI of agents — you'll understand why once you use it.

Resources

Resource	Link
📖 Documentation	ai.pydantic.dev
💬 Community (Slack)	Pydantic Slack
📦 PyPI	pydantic-ai
🔭 Observability	Pydantic Logfire
📝 Blog: How it was built	Pydantic blog
🎥 Intro video	Search "Pydantic AI tutorial 2025" on YouTube

🗂 Source Code Guide — "Read THESE Files"

pydantic_ai_slim/pydantic_ai/
├── agent/
│   └── __init__.py          # ⭐ THE file. Agent class definition, run(), run_sync(), run_stream()
├── _agent_graph.py           # ⭐ Internal agent execution graph — how runs actually execute
├── tools.py                  # ⭐ Tool decorator, RunContext, tool schema generation
├── result.py                 # ⭐ RunResult, StreamedRunResult — output handling
├── models/
│   ├── __init__.py           # Model ABC — how all model providers implement the same interface
│   ├── openai.py             # OpenAI provider implementation
│   └── anthropic.py          # Anthropic provider implementation
├── _a2a.py                   # Agent2Agent protocol integration
├── mcp.py                    # MCP client/server integration
└── _output.py                # Output type handling, Pydantic validation on LLM outputs

💡 Tip: Start with agent/__init__.py. It's beautifully documented with docstrings. Then read tools.py to understand how the @agent.tool decorator works. Finally, read _agent_graph.py to see how the runtime orchestrates tool calls.

Day 1 (Monday): Architecture Deep Dive

Read:

The full README
Docs: Introduction
Docs: Agents
Docs: Models Overview
Docs: Tools
Docs: Output / Structured Results
Docs: Dependency Injection (if exists) or see DI pattern in the bank support example

Identify core abstractions:

Agent — the central class (generic over deps + output type)
RunContext — carries dependencies into tool functions
Tool — decorated functions the LLM can call
ModelSettings — per-request model configuration
RunResult / StreamedRunResult — typed output containers

Understand the execution flow:

User prompt → Agent.run() → Model call → [Tool call → Tool execution → Model call]* → Validated output

📝 Homework: Write a 1-page architecture summary at ~/agent-study/notes/week1-architecture.md
- Cover: Agent lifecycle, dependency injection pattern, how tools are registered and called, how output validation works
- Draw a simple diagram (ASCII or hand-drawn photo is fine)

Day 2 (Tuesday): Hello World + Core Concepts

Setup:

cd ~/agent-study/week1-pydantic-ai
python -m venv .venv && source .venv/bin/activate
pip install pydantic-ai

Run the quickstart:

from pydantic_ai import Agent

agent = Agent(
    'anthropic:claude-sonnet-4-0',
    instructions='Be concise, reply with one sentence.',
)

result = agent.run_sync('Where does "hello world" come from?')
print(result.output)

Understand the core API surface:

agent.run() vs agent.run_sync() vs agent.run_stream()
How instructions work (static string vs dynamic function)
How model selection works (string shorthand vs model objects)
How result.output is typed
📝 Homework: Build the simplest agent from scratch — NO copy-paste
- Requirements: takes a topic, returns a structured output (use a Pydantic model as the output type)
- Must use at least one custom instruction
- Save at ~/agent-study/week1-pydantic-ai/hello_agent.py

Day 3 (Wednesday): Intermediate Build — Structured Output + DI

Focus: Pydantic-AI's killer features — type-safe structured output and dependency injection

Work through:

The bank support agent example from the docs
Docs: Structured Output / Streamed Results
Docs: Graph Support

Key concepts to grok:

How RunContext[DepsType] carries typed dependencies
How Pydantic models as output types create validated, structured responses
How tool docstrings become the tool description sent to the LLM
How streaming works with structured output (partial validation!)
📝 Homework: Build an agent that uses the framework's unique capabilities:
- Must include: Dependency injection with a real dependency (database mock, API client, etc.)
- Must include: Structured output via a Pydantic model (not just string output)
- Must include: At least 2 tools
- Example idea: A "recipe finder" agent with deps for a recipe database, tools for searching and filtering, output as a structured Recipe model
- Save at ~/agent-study/week1-pydantic-ai/structured_agent.py

Day 4 (Thursday): Advanced Patterns + Source Code Reading

Read these source files (in order):

pydantic_ai_slim/pydantic_ai/agent/__init__.py — How Agent class is defined, the generic type parameters
pydantic_ai_slim/pydantic_ai/tools.py — How @tool works, schema generation, RunContext
pydantic_ai_slim/pydantic_ai/_agent_graph.py — The internal execution engine
pydantic_ai_slim/pydantic_ai/result.py — How results are wrapped, streamed, validated
pydantic_ai_slim/pydantic_ai/models/__init__.py — The model provider ABC

Understand:

How errors from tool execution are passed back to the LLM for retry
How streaming works internally (incremental Pydantic validation)
How the _agent_graph.py orchestrates the conversation loop
How durable execution checkpoints work

Explore advanced features:

Docs: Durable Execution
Docs: MCP Integration
Docs: Human-in-the-Loop
Docs: Evals
📝 Homework: Write "What I'd Steal from Pydantic-AI" at ~/agent-study/notes/week1-steal.md
- Focus on: DI pattern, type-safe generics, streaming validation, tool retry pattern
- What design decisions are genius? What would you do differently?

Day 5 (Friday): Integration Project + Reflection

Build a mini-project that integrates with something real:
- Suggested: An agent that queries a real API (weather, GitHub, Hacker News), processes the data through tools, and returns a structured report as a Pydantic model
- Bonus: Add Logfire observability (it's free tier) and see the traces
- Bonus: Expose it as an MCP server
- Save at ~/agent-study/week1-pydantic-ai/integration_project/
Write retrospective at ~/agent-study/notes/week1-retro.md:
- Strengths of Pydantic-AI
- Weaknesses / gaps you noticed
- When would you reach for this vs building from scratch?
- What surprised you?
Start comparison matrix at ~/agent-study/comparison-matrix/matrix.md (see template)

🎯 Key Questions — You Should Be Able to Answer:

What does the Agent class generic signature Agent[DepsType, OutputType] buy you?
How does dependency injection work in Pydantic-AI and why is it better than global state?
How does Pydantic-AI validate structured output from an LLM that returns free-form text?
What happens when a tool call fails? How does the retry loop work?
What's the difference between run(), run_sync(), and run_stream()?
How would you add a new model provider to Pydantic-AI?
What is durable execution and when would you use it?

Week 2: Microsoft Agent Framework

Difficulty: ⭐⭐⭐ (Larger surface area, graph concepts, mono-repo navigation) Repo: github.com/microsoft/agent-framework Stars: 7k | Languages: Python + .NET | Born from: Semantic Kernel + AutoGen

Why This Is Week 2

If Pydantic-AI is the developer's choice, Microsoft Agent Framework is the enterprise's choice. It introduces graph-based workflows — a fundamentally different orchestration model from the simple agent loop you learned in Week 1. Understanding this framework means understanding where corporate AI agent development is heading.

Resources

Resource	Link
📖 Documentation	learn.microsoft.com/agent-framework
🚀 Quick Start	Quick Start Tutorial
💬 Discord	Discord
🎥 Intro Video (30 min)	YouTube
🎥 DevUI Demo (1 min)	YouTube
📦 PyPI	agent-framework
📝 Migration from SK	Semantic Kernel Migration
📝 Migration from AutoGen	AutoGen Migration

🗂 Source Code Guide

python/packages/
├── agent-framework/          # ⭐ Core package — agents, middleware, workflows
│   └── src/agent_framework/
│       ├── agents/           # Agent base classes and implementations
│       ├── workflows/        # ⭐ Graph-based workflow engine
│       └── middleware/       # ⭐ Request/response middleware pipeline
├── azure-ai/                 # Azure AI provider (Responses API)
├── openai/                   # OpenAI provider
├── anthropic/                # Anthropic provider
├── devui/                    # ⭐ Developer UI for debugging workflows
├── mcp/                      # MCP integration
├── a2a/                      # Agent2Agent protocol
└── lab/                      # Experimental features (benchmarking, RL)

python/samples/getting_started/
├── agents/                   # ⭐ Start here — basic agent examples
├── workflows/                # ⭐ Graph workflow examples (critical!)
├── middleware/               # Middleware examples
└── observability/            # OpenTelemetry integration

💡 Tip: This is a mono-repo. Don't try to read everything. Focus on python/packages/agent-framework/ for the core, and python/samples/getting_started/workflows/ for the graph workflow examples.

Day 1 (Monday): Architecture Deep Dive

Read:

Overview
The full README
User Guide Overview
Watch the 30-min intro video (at 1.5x speed)
Skim the SK migration guide to understand lineage

Identify core abstractions:

Agent — base agent interface
Workflow / Graph — the graph-based orchestration system
Middleware — request/response processing pipeline
AgentProvider — LLM provider abstraction
DevUI — visual debugging tool

Key architectural insight: This framework uses a data-flow graph model where nodes are agents or functions, and edges carry data between them. This is fundamentally different from Pydantic-AI's linear agent loop.

📝 Homework: Write a 1-page architecture summary at ~/agent-study/notes/week2-architecture.md
- Compare the graph workflow model to Pydantic-AI's linear model
- Draw the graph workflow concept (nodes = agents/functions, edges = data flow)

Day 2 (Tuesday): Hello World + Core Concepts

Setup:

cd ~/agent-study/week2-ms-agent
python -m venv .venv && source .venv/bin/activate
pip install agent-framework --pre
# You'll need Azure credentials or an OpenAI key

Run the quickstart:

import asyncio
from agent_framework.openai import OpenAIChatClient

async def main():
    agent = OpenAIChatClient(
        api_key="your-key"
    ).as_agent(
        name="HaikuBot",
        instructions="You are an upbeat assistant that writes beautifully.",
    )
    print(await agent.run("Write a haiku about AI agents."))

asyncio.run(main())

Understand:

as_agent() pattern — how providers become agents
The difference between Chat agents and Responses agents
How the Python API differs from the .NET API (skim a .NET example)
📝 Homework: Build the simplest agent from scratch — NO copy-paste
- Save at ~/agent-study/week2-ms-agent/hello_agent.py

Day 3 (Wednesday): Intermediate Build — Graph Workflows

This is the key differentiator. This is the day that matters.

Work through:

python/samples/getting_started/workflows/ — all examples
Docs: Workflow/Graph tutorials on learn.microsoft.com
Understand streaming, checkpointing, and time-travel in graphs

Key concepts:

How nodes in a graph can be agents OR deterministic functions
How data flows between nodes via typed edges
How checkpointing enables pause/resume of long-running workflows
How human-in-the-loop fits into the graph model
How time-travel lets you replay/debug workflows
📝 Homework: Build a graph workflow:
- Must include: At least 3 nodes (mix of agent nodes and function nodes)
- Must include: Branching logic (conditional edges)
- Example idea: A "content pipeline" — Node 1 (agent: research a topic) → Node 2 (function: format research) → Node 3 (agent: write blog post) with a branch for "needs more research"
- Save at ~/agent-study/week2-ms-agent/graph_workflow.py

Day 4 (Thursday): Advanced Patterns + Source Code Reading

Read these source files:

Core agent base classes in python/packages/agent-framework/
Workflow/graph engine implementation
Middleware pipeline implementation
DevUI package structure
At least one provider implementation (OpenAI or Azure)

Explore:

Set up and run the DevUI — visualize your graph workflow from Day 3
Look at the OpenTelemetry integration — python/samples/getting_started/observability/
Read the middleware examples — understand the request/response pipeline
Check out the lab package — what's experimental?
📝 Homework: Write "What I'd Steal from MS Agent Framework" at ~/agent-study/notes/week2-steal.md
- Focus on: Graph workflow model, DevUI concept, middleware pipeline, multi-language support
- Compare to Pydantic-AI: when would you choose one over the other?

Day 5 (Friday): Integration Project + Reflection

Build a mini-project:
- Suggested: A multi-step data processing pipeline using graph workflows
- Must have: at least one agent node calling an LLM, at least one pure function node, checkpointing enabled
- Bonus: Get the DevUI running and screenshot your workflow visualization
- Save at ~/agent-study/week2-ms-agent/integration_project/
Write retrospective at ~/agent-study/notes/week2-retro.md
Update comparison matrix — add MS Agent Framework entry

🎯 Key Questions:

What's the difference between a linear agent loop and a graph-based workflow?
How does checkpointing work in MS Agent Framework workflows?
What does "time-travel" mean in the context of agent debugging?
How does the middleware pipeline work and when would you use it?
What's the DevUI and what can you debug with it that you can't with logs alone?
How does this framework's agent abstraction compare to Pydantic-AI's Agent class?
When would you choose MS Agent Framework over Pydantic-AI? (Think: team size, workflow complexity, language requirements)

Week 3: Agent-S

Difficulty: ⭐⭐⭐⭐ (Requires GPU for grounding model, novel paradigm, research-grade code) Repo: github.com/simular-ai/Agent-S Stars: 9.6k | Language: Python | Papers: ICLR 2025, COLM 2025

Why This Is Week 3

This is a completely different paradigm. Weeks 1-2 were about agents that work with APIs and text. Agent-S works with pixels and clicks — it uses your computer like a human does. This is the frontier of agent development. Understanding Agent-S means understanding where computer-use agents are heading.

Resources

Resource	Link
📖 Repo	github.com/simular-ai/Agent-S
💬 Discord	Discord
📄 S1 Paper (ICLR 2025)	arxiv.org/abs/2410.08164
📄 S2 Paper (COLM 2025)	arxiv.org/abs/2504.00906
📄 S3 Paper	arxiv.org/abs/2510.02250
🌐 S3 Blog	simular.ai/articles/agent-s3
🎥 S3 Video	YouTube
📦 PyPI	gui-agents
🤗 Grounding Model	UI-TARS-1.5-7B

🗂 Source Code Guide

gui_agents/
├── s3/                       # ⭐ Latest version — start here
│   ├── cli_app.py            # ⭐ Entry point — CLI application, main loop
│   ├── agents/               # ⭐ Agent implementations (planning, grounding, execution)
│   ├── core/                 # ⭐ Core abstractions (screenshot, actions, state)
│   ├── bbon/                 # Behavior Best-of-N — sampling strategy for better performance
│   └── prompts/              # System prompts for each agent role
├── s2/                       # Previous version
├── s2_5/                     # Intermediate version
├── s1/                       # Original version (ICLR 2025)
└── utils.py                  # Shared utilities

💡 Tip: Focus entirely on gui_agents/s3/. Read the papers' system diagrams first, THEN the code. The code makes 10x more sense with the paper's architecture diagram in front of you.

⚠️ Setup Note: Agent-S requires a grounding model (UI-TARS-1.5-7B). You can host it on Hugging Face Inference Endpoints (~$1-2/hr for A10G), use a free tier if available, or run it locally if you have a capable GPU (16GB+ VRAM). Alternatively, study the code architecture without running the full system.

Day 1 (Monday): Architecture Deep Dive

Read:

The full README
S3 blog post — accessible overview
S1 Paper (at least abstract + Sections 1-3) — core architecture concepts
S3 Paper (abstract + architecture section) — latest improvements
models.md in the repo — supported model configurations

Identify core abstractions:

Screenshot Capture — the agent "sees" the screen as an image
Grounding Model (UI-TARS) — converts screenshots to UI element locations
Planning Agent — decides what to do based on current screen + goal
Execution Agent — translates plans into mouse/keyboard actions
Behavior Best-of-N (bBoN) — run multiple rollouts, pick the best

The pipeline:

Task → Screenshot → Grounding (UI-TARS: identify elements) → Planning (LLM: what to do) → Action (click/type/scroll) → New Screenshot → Loop

📝 Homework: Write architecture summary at ~/agent-study/notes/week3-architecture.md
- Include the screenshot→grounding→planning→action pipeline
- Explain bBoN and why it matters (72.6% vs 66% on OSWorld)
- Compare: how is "seeing" a screen different from "calling" an API?

Day 2 (Tuesday): Hello World + Core Concepts

Setup:

cd ~/agent-study/week3-agent-s
python -m venv .venv && source .venv/bin/activate
pip install gui-agents
brew install tesseract  # Required dependency

API configuration:

export OPENAI_API_KEY=<your-key>
export ANTHROPIC_API_KEY=<your-key>
export HF_TOKEN=<your-huggingface-token>

Run Agent-S3 (if you have grounding model access):

agent_s \
  --provider openai \
  --model gpt-4o \
  --ground_provider huggingface \
  --ground_url <your-endpoint-url> \
  --ground_model ui-tars-1.5-7b \
  --grounding_width 1920 \
  --grounding_height 1080

If you can't run it: Read through gui_agents/s3/cli_app.py line by line and trace the execution flow. Understand what WOULD happen at each step.

📝 Homework: Even if you can't run the full agent, build a minimal screenshot → analysis script:

# Take a screenshot, send it to a vision model, get a description of UI elements
# This exercises the same "visual grounding" concept, just simplified

Save at ~/agent-study/week3-agent-s/hello_agent.py

Day 3 (Wednesday): Intermediate Build — Understanding Computer Use

Work through:

Read gui_agents/s3/agents/ — understand the multi-agent architecture
Read gui_agents/s3/core/ — how screenshots are captured and actions are executed
Study the prompt templates in gui_agents/s3/ — how the LLM is instructed
Understand the bBoN strategy in gui_agents/s3/bbon/

Key concepts:

How screenshots are processed and annotated for the LLM
How the grounding model converts visual elements to coordinates
How actions (click, type, scroll) are executed on the OS level
Cross-platform differences (Linux/Mac/Windows)
The local coding environment feature
📝 Homework: Build something that uses the computer-use paradigm:
- Option A (with GPU): Give Agent-S a simple task (open a browser, search for something, copy a result)
- Option B (without GPU): Build a simplified "screen reader" agent that takes a screenshot, uses a vision model to understand the UI, and outputs a structured description of what's on screen + suggested next actions
- Save at ~/agent-study/week3-agent-s/computer_use_demo/

Day 4 (Thursday): Advanced Patterns + Source Code Reading

Read these source files (in order):

gui_agents/s3/cli_app.py — Main entry point, execution loop
gui_agents/s3/agents/ — Each agent role (planner, executor, grounding)
gui_agents/s3/core/ — Screenshot capture, action execution, state management
gui_agents/s3/bbon/ — Behavior Best-of-N implementation
gui_agents/s1/ (briefly) — Compare S1 architecture to S3 to see evolution

Explore the papers' techniques:

How does "experience-augmented hierarchical planning" work? (S1)
What's the "Mixture of Grounding" approach? (S2)
How does S3 achieve simplicity while improving performance?
📝 Homework: Write "What I'd Steal from Agent-S" at ~/agent-study/notes/week3-steal.md
- Focus on: The screenshot→grounding→action pipeline, bBoN strategy, cross-platform abstractions
- Think about: Could you add computer-use capabilities to a Pydantic-AI agent as a tool?

Day 5 (Friday): Integration Project + Reflection

Build a mini-project:
- Suggested: A "screen monitoring" agent that periodically screenshots your desktop, uses a vision model to understand what's happening, and logs structured summaries (using Pydantic-AI for the structured output!)
- Alternative: Build a browser automation agent using Playwright + vision model (a simplified version of Agent-S's approach)
- Save at ~/agent-study/week3-agent-s/integration_project/
Write retrospective at ~/agent-study/notes/week3-retro.md
Update comparison matrix

🎯 Key Questions:

What is the screenshot → grounding → action pipeline and why is it powerful?
Why does Agent-S need a separate grounding model (UI-TARS) in addition to the planning LLM?
What is Behavior Best-of-N and how does it improve performance by ~6%?
How is computer-use fundamentally different from API-based agent frameworks?
What are the security implications of an agent that can control your mouse and keyboard?
What's the difference between Agent-S's approach and Anthropic's Computer Use or OpenAI's Operator?
When would you use computer-use agents vs. API-based agents? Give 3 examples of each.

Week 4: GPT Researcher

Difficulty: ⭐⭐ (Straightforward architecture, well-documented, familiar patterns) Repo: github.com/assafelovic/gpt-researcher Stars: 25k | Language: Python

Why This Is Week 4

After 3 weeks of studying how agents work internally, this week is about studying a complete, purpose-built agent that does one thing extremely well: research. GPT Researcher is the best example of the "Plan-and-Solve + RAG" pattern — a design you'll reuse in your own projects.

Resources

Resource	Link
📖 Documentation	docs.gptr.dev
💬 Discord	Discord
📦 PyPI	gpt-researcher
📝 Blog: How it was built	docs.gptr.dev/blog
🎥 Demo	YouTube
🔧 MCP Integration	MCP Guide
📜 Plan-and-Solve Paper	arxiv.org/abs/2305.04091

🗂 Source Code Guide

gpt_researcher/
├── agent.py                  # ⭐ THE file. GPTResearcher class — the entire research orchestration
├── actions/                  # ⭐ Research actions (generate questions, search, scrape, synthesize)
│   ├── query_processing.py   # How research questions are generated from the user query
│   ├── web_search.py         # Web search execution
│   └── report_generation.py  # Final report synthesis
├── config/                   # Configuration management
│   └── config.py             # All configurable parameters
├── context/                  # ⭐ Context management — how gathered info is stored/retrieved
│   └── compression.py        # How context is compressed to fit token limits
├── document/                 # Document processing (PDF, web pages, etc.)
├── memory/                   # ⭐ Research memory — how the agent remembers what it's found
├── orchestrator/             # ⭐ Deep research — recursive tree exploration
│   └── agent/                # Sub-agents for deep research mode
├── retrievers/               # ⭐ Web/local search implementations (Tavily, DuckDuckGo, MCP, etc.)
└── scraper/                  # Web scraping implementations

💡 Tip: agent.py is the heart. It's one file, ~700 lines, and it contains the entire research orchestration. Read it top to bottom. Then read actions/ to understand each step.

Day 1 (Monday): Architecture Deep Dive

Read:

Understand the Plan-and-Solve architecture:

User Query
  → Planner Agent: Generate N research questions
  → For each question:
      → Crawler Agent: Search web, gather sources
      → Summarizer: Extract relevant info from each source
      → Source tracker: Track citations
  → Publisher Agent: Aggregate all findings into a report

Deep Research mode adds recursion:

User Query → Generate sub-topics → For each sub-topic → Generate deeper sub-topics → ... → Aggregate bottom-up

📝 Homework: Write architecture summary at ~/agent-study/notes/week4-architecture.md

Day 2 (Tuesday): Hello World + Core Concepts

Setup:

cd ~/agent-study/week4-gpt-researcher
python -m venv .venv && source .venv/bin/activate
pip install gpt-researcher

# Set required API keys
export OPENAI_API_KEY=<your-key>
export TAVILY_API_KEY=<your-key>

Run the simplest version:

from gpt_researcher import GPTResearcher
import asyncio

async def main():
    query = "What are the latest advancements in AI agent frameworks in 2025-2026?"
    researcher = GPTResearcher(query=query)
    research_result = await researcher.conduct_research()
    report = await researcher.write_report()
    print(report)

asyncio.run(main())

Also try the web UI:

git clone https://github.com/assafelovic/gpt-researcher.git
cd gpt-researcher
pip install -r requirements.txt
python -m uvicorn main:app --reload
# Visit http://localhost:8000

📝 Homework: Build a minimal research agent from scratch — NO copy-paste
- Save at ~/agent-study/week4-gpt-researcher/hello_researcher.py

Day 3 (Wednesday): Intermediate Build — Deep Research + MCP

Focus: GPT Researcher's key differentiators — Deep Research mode and MCP integration

Work through:

Deep Research docs
MCP Integration Guide
Local document research
Run a Deep Research query and observe the recursive tree exploration

Key concepts:

How Deep Research recursively explores sub-topics
How MCP connects GPT Researcher to external data sources
How context compression prevents token limit issues
How source tracking and citations work
The difference between web research and local document research
📝 Homework: Build a research agent that uses GPT Researcher's unique capabilities:
- Must include: MCP integration with at least one external source (e.g., GitHub MCP server)
- OR: Research over local documents (PDFs, markdown files from your study notes)
- Bonus: Use Deep Research mode for a complex topic
- Save at ~/agent-study/week4-gpt-researcher/deep_research_demo.py

Day 4 (Thursday): Advanced Patterns + Source Code Reading

Read these source files (in order):

gpt_researcher/agent.py — The entire GPTResearcher class, top to bottom
gpt_researcher/actions/query_processing.py — How research questions are generated
gpt_researcher/context/compression.py — How context is managed within token limits
gpt_researcher/orchestrator/ — Deep research recursive tree implementation
gpt_researcher/retrievers/ — How different search providers are integrated

Understand:

How the planner decomposes a query into research questions
How the agent handles rate limiting and API failures
How context compression works (this is critical for long research)
How the orchestrator manages the recursive tree in Deep Research mode
How the report generator synthesizes multiple sources into a coherent report
📝 Homework: Write "What I'd Steal from GPT Researcher" at ~/agent-study/notes/week4-steal.md
- Focus on: Plan-and-Solve decomposition, context compression, source tracking, recursive exploration
- Compare: how would you build "deep research" capability into a Pydantic-AI agent?

Day 5 (Friday): Integration Project + Reflection

Build a mini-project:
- Suggested: A "competitive analysis" agent — given a company/product, it researches competitors, pricing, features, and generates a structured comparison report. Use GPT Researcher's engine + Pydantic-AI for structured output.
- Alternative: Install GPT Researcher as a Claude Skill and use it in your Claude workflow
- Save at ~/agent-study/week4-gpt-researcher/integration_project/
Write retrospective at ~/agent-study/notes/week4-retro.md
Update comparison matrix

🎯 Key Questions:

What is the Plan-and-Solve pattern and how does GPT Researcher implement it?
How does Deep Research differ from regular research? Draw the tree structure.
How does context compression prevent token limit issues during long research?
How does GPT Researcher track and cite sources?
What search providers does GPT Researcher support and how do you add a new one?
How could you combine GPT Researcher with Pydantic-AI for structured research outputs?
What are the limitations of automated research (hallucination, bias, recency)?

Week 5: Yao

Difficulty: ⭐⭐⭐⭐ (Go language, novel architecture, less documentation, paradigm shift) Repo: github.com/YaoApp/yao Stars: 7.5k | Language: Go | Runtime: Single binary with V8 engine

Why This Is Week 5

Yao is the most architecturally unique repo in the entire study. It's not a chatbot framework — it's an autonomous agent engine where agents are triggered by events, schedules, and emails. This is the only Go-based framework, the only one with event-driven architecture, and the only one that deploys as a single binary. If everything else is "AI assistant," Yao is "AI team member."

⚠️ Language Note: This week requires Go. If you don't know Go, spend an extra hour on Day 1 doing the Go Tour. You don't need to be fluent — just enough to read the source code.

Resources

Resource	Link
🏠 Homepage	yaoapps.com
📖 Documentation	yaoapps.com/docs
🚀 Quick Start	Getting Started
✨ Why Yao?	Why Yao
🤖 Agent Examples	YaoAgents/awesome
📦 Install Script	`curl -fsSL https://yaoapps.com/install.sh \| bash`
🐹 Go Tour (if needed)	go.dev/tour

🗂 Source Code Guide

yao/
├── engine/
│   └── process.go            # ⭐ Process engine — core concept in Yao
├── agent/                    # ⭐ Agent framework — autonomous agent definitions
│   ├── agent.go              # Agent lifecycle, trigger modes, execution phases
│   └── triggers/             # Clock, Human, Event trigger implementations
├── runtime/
│   └── v8/                   # ⭐ Built-in V8 JavaScript/TypeScript engine
├── rag/
│   └── graph/                # ⭐ Built-in GraphRAG implementation
├── mcp/                      # MCP integration
├── api/                      # HTTP server and REST API
├── model/                    # ORM and database layer
└── cmd/
    └── yao/
        └── main.go           # Application entry point

💡 Tip: Yao's DSL-based approach means you'll be reading .yao files (YAML-like definitions) as much as Go source code. The mental model is: you define agents as data (DSL), and the engine executes them.

Day 1 (Monday): Architecture Deep Dive

Read:

Full README
Why Yao?
Documentation overview
Skim the Go source: cmd/yao/main.go → engine/process.go → agent/agent.go

Understand Yao's radical differences:

Traditional Agent	Yao Agent
Entry point: chatbox	Entry point: email, events, schedules
Passive: you ask, it answers	Proactive: it works autonomously
Role: tool	Role: team member

The six-phase execution model:

Inspiration → Goals → Tasks → Run → Deliver → Learn

Three trigger modes:

Clock — scheduled tasks (cron-like)
Human — triggered by email or messages
Event — triggered by webhooks or database changes

📝 Homework: Write architecture summary at ~/agent-study/notes/week5-architecture.md
- Focus on: How the event-driven model is fundamentally different from request-response
- Compare: 6-phase execution vs Pydantic-AI's run loop vs MS Agent Framework's graph

Day 2 (Tuesday): Hello World + Core Concepts

Setup:

# Install Yao (single binary!)
curl -fsSL https://yaoapps.com/install.sh | bash

# Create a project
cd ~/agent-study/week5-yao
mkdir project && cd project
yao start  # First run creates project structure
# Visit http://127.0.0.1:5099

Run your first process:

yao run utils.app.Ping                                    # Returns version
yao run scripts.tests.Hello 'Hello, Yao!'                 # Run TypeScript
yao run models.tests.pet.Find 1 '::{}'                    # Query database

Understand core concepts:

Processes — functions that can be run directly or referenced in code
Models — database models defined in .mod.yao files
Scripts — TypeScript/JavaScript code executed by the built-in V8 engine
DSL — Yao's declarative syntax for defining everything
📝 Homework: Build the simplest Yao application from scratch:
- Define a model, write a process, create a simple API endpoint
- Save project at ~/agent-study/week5-yao/hello_project/

Day 3 (Wednesday): Intermediate Build — Event-Driven Agents

Focus: What makes Yao unique — event-driven, proactive agents

Work through:

Agent configuration — defining agents with roles and triggers
Setting up a scheduled (Clock) trigger
Setting up an Event trigger (webhook → agent action)
MCP integration — connecting external tools
GraphRAG — how the built-in knowledge graph works

Key concepts:

How agents are defined declaratively (vs. programmatically in Python frameworks)
How the three trigger modes work in practice
How agents learn from past executions (the "Learn" phase)
How GraphRAG combines vector search with graph traversal
Why a single binary matters for deployment
📝 Homework: Build an event-driven agent:
- Must include: At least 2 different trigger modes (e.g., Clock + Event)
- Must include: An agent that does something proactively (not just responding to a chat)
- Example idea: An agent that checks an RSS feed on a schedule (Clock), processes new articles (Run), and stores summaries in the knowledge base (Learn/Deliver)
- Save at ~/agent-study/week5-yao/event_agent/

Day 4 (Thursday): Advanced Patterns + Source Code Reading

Read these source files (in order):

cmd/yao/main.go — Application entry point, how the single binary initializes
engine/process.go — The process engine (core execution abstraction)
agent/agent.go — Agent lifecycle and execution phases
runtime/v8/ — How the V8 engine is embedded for TypeScript support
rag/graph/ — GraphRAG implementation (vector + graph hybrid search)

Understand:

How Go's concurrency model (goroutines) enables event-driven agents
How the V8 engine is embedded and used for TypeScript execution
How GraphRAG combines embedding search with entity-relationship traversal
How a single Go binary includes all these features without external dependencies
📝 Homework: Write "What I'd Steal from Yao" at ~/agent-study/notes/week5-steal.md
- Focus on: Event-driven architecture, single binary deployment, GraphRAG, DSL approach
- Think about: Could you add event-driven capabilities to a Python agent framework?

Day 5 (Friday): Integration Project + Reflection

Build a mini-project:
- Suggested: A "daily briefing" agent — schedule it to run every morning, have it gather data from APIs (weather, calendar, news), process it, and output a structured briefing. Use the Clock trigger + MCP for external data.
- Alternative: Build a webhook-triggered agent that processes incoming data and stores it in GraphRAG
- Save at ~/agent-study/week5-yao/integration_project/
Write retrospective at ~/agent-study/notes/week5-retro.md
Update comparison matrix

🎯 Key Questions:

How does Yao's event-driven model differ from the request-response model of every other framework?
What are the three trigger modes and when would you use each?
What is the six-phase execution model and how does the "Learn" phase create a feedback loop?
Why is single-binary deployment a significant advantage? Where would you deploy Yao that you couldn't deploy Python frameworks?
How does Yao's built-in GraphRAG differ from vector-only RAG?
What does it mean that Yao embeds a V8 engine? What are the implications for extensibility?
What types of applications is Yao best suited for vs. worst suited for?

Week 6: MetaGPT

Difficulty: ⭐⭐⭐ (Large codebase, academic concepts, multi-agent complexity) Repo: github.com/FoundationAgents/MetaGPT Stars: 63k | Language: Python | Papers: ICLR 2024 + many more

Why This Is Week 6

MetaGPT is the OG multi-agent framework and the capstone of your study. It introduces Standard Operating Procedures (SOPs) as the coordination mechanism — a genuinely novel idea that maps human organizational structures onto AI agents. By Week 6, you have enough context from the previous 5 frameworks to deeply appreciate what MetaGPT does differently.

Resources

Resource	Link
📖 Documentation	docs.deepwisdom.ai
💬 Discord	Discord
📦 PyPI	metagpt
🎯 MGX (commercial product)	mgx.dev
📄 MetaGPT Paper (ICLR 2024)	openreview.net
📄 AFlow Paper (ICLR 2025 Oral)	openreview.net
📝 Agent 101 Tutorial	Agent 101
📝 MultiAgent 101	MultiAgent 101
🤗 HuggingFace Demo	MetaGPT Space

🗂 Source Code Guide

metagpt/
├── roles/                    # ⭐ Role definitions — each role = one agent with a job
│   ├── role.py               # ⭐ Base Role class — THE core abstraction
│   ├── architect.py          # Software architect agent
│   ├── engineer.py           # Software engineer agent
│   ├── product_manager.py    # Product manager agent
│   ├── project_manager.py    # Project manager agent
│   └── di/
│       └── data_interpreter.py  # Data analysis agent
├── actions/                  # ⭐ Action definitions — what roles can do
│   ├── action.py             # Base Action class
│   ├── write_prd.py          # Write Product Requirements Document
│   ├── write_design.py       # Write system design
│   └── write_code.py         # Write code
├── team.py                   # ⭐ Team orchestration — how roles collaborate via SOPs
├── environment.py            # ⭐ Shared environment — message passing between roles
├── schema.py                 # Message schemas for inter-role communication
├── config2.py                # Configuration management
├── base/                     # Base classes and utilities
├── memory/                   # Memory management for roles
├── software_company.py       # ⭐ The "software company" end-to-end pipeline
└── utils/
    └── project_repo.py       # Project repository management

💡 Tip: The mental model is: Role (who) performs Actions (what) according to SOPs (how). Read roles/role.py first, then actions/action.py, then team.py. That's the holy trinity of MetaGPT.

Day 1 (Monday): Architecture Deep Dive

Read:

Full README
Agent 101 Tutorial
MultiAgent 101 Tutorial
MetaGPT paper (abstract + Sections 1-3) — the SOP concept
Skim the AFlow paper abstract — automated workflow generation

Core philosophy: Code = SOP(Team)

Identify core abstractions:

Role — an agent with a specific job (PM, architect, engineer, etc.)
Action — a discrete task a role can perform (write PRD, write code, etc.)
SOP — Standard Operating Procedures that define the workflow between roles
Team — the orchestrator that manages roles and message passing
Environment — shared context where roles publish and subscribe to messages
Message — typed communication between roles

The "software company" pipeline:

User Requirement
  → Product Manager (writes PRD)
    → Architect (writes system design)
      → Project Manager (creates task breakdown)
        → Engineer (writes code)
          → QA (tests code)

📝 Homework: Write architecture summary at ~/agent-study/notes/week6-architecture.md
- Explain the SOP model and how it maps to human organizations
- Compare: SOP coordination vs Graph workflows (MS) vs Event-driven (Yao) vs Linear (Pydantic-AI)

Day 2 (Tuesday): Hello World + Core Concepts

Setup:

cd ~/agent-study/week6-metagpt
conda create -n metagpt python=3.11 && conda activate metagpt
pip install --upgrade metagpt
metagpt --init-config  # Creates ~/.metagpt/config2.yaml
# Edit the config to add your API key

Run the classic demo:

metagpt "Create a snake game"  # This will generate a full project in ./workspace

Also try programmatically:

from metagpt.software_company import generate_repo
from metagpt.utils.project_repo import ProjectRepo

repo: ProjectRepo = generate_repo("Create a simple calculator app")
print(repo)

And try the Data Interpreter:

import asyncio
from metagpt.roles.di.data_interpreter import DataInterpreter

async def main():
    di = DataInterpreter()
    await di.run("Run data analysis on sklearn Iris dataset, include a plot")

asyncio.run(main())

📝 Homework: Build a custom role from scratch — NO copy-paste:
- Define a new Role subclass with custom Actions
- Example: a "ResearchAnalyst" role that takes a topic and produces a structured analysis
- Save at ~/agent-study/week6-metagpt/hello_role.py

Day 3 (Wednesday): Intermediate Build — Multi-Agent SOPs

Focus: MetaGPT's unique capability — SOP-based multi-agent coordination

Work through:

MultiAgent 101
Look at the Debate example
Understand how messages flow between roles via the Environment
Understand how the SOP defines which role acts after which

Key concepts:

How roles subscribe to message types from other roles
How the Team orchestrator manages turn-taking
How the Environment enables publish/subscribe communication
How SOPs encode workflow logic without explicit graph definitions
The difference between the "software company" SOP and custom SOPs
📝 Homework: Build a multi-agent system with a custom SOP:
- Must include: At least 3 custom roles with different responsibilities
- Must include: Custom message types between roles
- Must include: A clear SOP workflow (Role A → Role B → Role C)
- Example idea: A "content creation team" — Researcher (gathers info) → Writer (drafts article) → Editor (reviews and improves) → Publisher (formats final output)
- Save at ~/agent-study/week6-metagpt/multi_agent_sop.py

Day 4 (Thursday): Advanced Patterns + Source Code Reading

Read these source files (in order):

metagpt/roles/role.py — Base Role class, how roles think and act
metagpt/actions/action.py — Base Action class, how actions execute
metagpt/team.py — Team orchestration, turn management
metagpt/environment.py — Message passing, pub/sub system
metagpt/schema.py — Message types and schemas

Also explore:

metagpt/roles/engineer.py — how the Engineer role writes code (complex action chain)
metagpt/software_company.py — the end-to-end pipeline
metagpt/memory/ — how roles maintain memory across turns
examples/ — AFlow and SPO implementations

Advanced concepts:

How does AFlow (Automated Agentic Workflow Generation) work?
What is SPO (Self-Play Optimization)?
How does the Data Interpreter differ from the Software Company pipeline?
📝 Homework: Write "What I'd Steal from MetaGPT" at ~/agent-study/notes/week6-steal.md
- Focus on: SOP-based coordination, Role/Action abstraction, message-passing environment
- Reflect on: Which coordination model do you prefer? Graph (MS) vs SOP (MetaGPT) vs Event (Yao)?

Day 5 (Friday): Integration Project + Final Reflection

Build a mini-project:
- Suggested: A multi-agent system that takes a business idea and produces a full analysis: Market Researcher role → Business Analyst role → Financial Modeler role → Report Writer role. Each produces a structured output that feeds into the next.
- Save at ~/agent-study/week6-metagpt/integration_project/
Write final retrospective at ~/agent-study/notes/week6-retro.md
- This one should be more comprehensive — reflect on ALL 6 weeks
- What framework would you reach for first? When?
- What surprised you most across the study?
Complete comparison matrix — all 6 frameworks
Commit and push everything to your study git repo

🎯 Key Questions:

What does "Code = SOP(Team)" mean concretely?
How does the Role/Action/SOP model map to real organizational structures?
How do messages flow between roles? What's the pub/sub mechanism?
What's the difference between MetaGPT's approach and MS Agent Framework's graph workflows?
How does the Data Interpreter feature differ from the Software Company pipeline?
What is AFlow and why was it accepted as an oral presentation at ICLR 2025?
When would you use MetaGPT vs simpler single-agent frameworks?
Across all 6 frameworks, which coordination model (linear/graph/SOP/event) do you think is most general?

Week 7: ElizaOS

Timeline: 1 week | Difficulty: ⭐⭐ | Goal: Learn agent deployment & multi-platform distribution Repo: elizaOS/eliza | ⭐ 17,476 | TypeScript Why this week: Weeks 1-6 taught you how to BUILD agents. This week teaches you how to DEPLOY them where users actually are.

Why ElizaOS Makes The Cut

After a thorough debate (see the deep dive analysis), ElizaOS earned its spot because:

It's the only deployment-focused platform on the trending list — multi-platform routing (Discord, Telegram, Twitter, Farcaster) in one framework
17k stars with active development and a large community
The plugin architecture, character system, and platform adapters teach real deployment patterns you won't learn from any other framework studied
Knowing how to ship agents to where users live is as important as knowing how to build them

Resources

Resource	URL
GitHub	https://github.com/elizaOS/eliza
Docs	https://elizaos.github.io/eliza/
Discord	https://discord.gg/elizaos
Quickstart	https://elizaos.github.io/eliza/docs/quickstart

Key Source Files to Read

File	Why It Matters
`packages/core/src/runtime.ts`	The AgentRuntime — the central brain that coordinates everything
`packages/core/src/types.ts`	All the core interfaces (Character, Memory, Action, Provider, Evaluator)
`packages/plugin-discord/src/index.ts`	How a platform adapter is built — the Discord integration
`packages/plugin-telegram/src/index.ts`	Compare with Discord adapter — spot the platform abstraction pattern
`packages/core/src/memory.ts`	Memory management — how agents maintain context across platforms
`agent/src/index.ts`	The entry point — how everything gets wired together

Day 1 (Monday): Architecture Deep Dive — The Deployment Platform

Study (1-2 hrs):

Read the full README and quickstart docs
Understand the core architecture:
- Character files — how agent personalities are defined (JSON-based)
- AgentRuntime — the central coordinator
- Plugins — how platform adapters, actions, and providers are registered
- Actions vs Evaluators vs Providers — the three extension points
- Memory — how conversation state persists across platforms
Study the plugin system architecture — how does one agent connect to Discord AND Telegram simultaneously?
Understand the character file format — what can you configure?

Key Questions:

How does ElizaOS route a message from Discord to the right agent and back?
What's the difference between an Action, an Evaluator, and a Provider?
How does the memory system work across platforms? Can an agent remember a Discord convo when talking on Telegram?
How does the character file influence agent behavior vs hard-coded logic?

Homework:

Write a 1-page architecture summary covering: runtime → plugins → adapters → memory → character system
Draw a diagram showing message flow: User sends Discord message → ... → Agent responds
Compare the architecture to Pydantic-AI's approach — what's different about a "deployment-first" vs "logic-first" framework?

Day 2 (Tuesday): Hello World — Deploy an Agent to Discord

Study (1-2 hrs):

Set up the ElizaOS development environment
- Clone the repo, install deps (pnpm install)
- Create a Discord bot in the Discord Developer Portal (you'll need a test server)
- Set up your .env with Discord bot token and an LLM API key
Create a custom character file for your agent:
- Define name, bio, personality traits, example conversations
- Set the model provider and platform connections
Run the agent locally, verify it responds in Discord

Homework:

Create a character file from scratch (no copy-paste from examples) — give it a distinct personality
Deploy the agent to your Discord test server and have a 10-message conversation with it
Screenshot the conversation and note: What worked? What felt off? How does character configuration affect responses?

Day 3 (Wednesday): Multi-Platform + Plugin System

Study (1-2 hrs):

Add a second platform — connect the same agent to Telegram (or Twitter)
- Same character, same agent, two platforms simultaneously
- Observe: does memory carry across? How does the agent handle platform-specific features?
Study the plugin architecture:
- Read how plugin-discord and plugin-telegram are structured
- Understand the Plugin interface — what does a plugin provide?
- Look at how Actions work — these are the agent's "tools"
Write a custom Action plugin:
- Something simple: a weather lookup, a file reader, or a joke generator
- Register it and verify your agent can use it on both platforms

Homework:

Run your agent on 2 platforms simultaneously — screenshot both conversations
Build a custom Action plugin from scratch and verify it works
Write a comparison: how does ElizaOS's plugin system compare to Pydantic-AI's tool system and MetaGPT's action system? What are the trade-offs?

Day 4 (Thursday): Source Code Reading + Advanced Patterns

Study (1-2 hrs):

Read the key source files from the table above, focusing on:
- runtime.ts — How does the AgentRuntime process an incoming message? What's the evaluation pipeline?
- types.ts — What are all the interfaces? How extensible is the system?
- memory.ts — How is conversation history stored and retrieved? What's the embedding strategy?
Study advanced patterns:
- Multi-agent setups — can you run multiple agents with different characters?
- Custom evaluators — how do you add post-processing logic?
- Custom providers — how do you inject context into every agent response?
Compare deployment architecture decisions:
- How does ElizaOS handle rate limiting across platforms?
- How does it handle platform-specific message formatting (embeds, buttons, etc.)?
- What's the error handling strategy when a platform adapter fails?

Homework:

Write a "What I'd Steal From ElizaOS" doc — which patterns are worth using in your own projects? Think:
- Character file abstraction for agent personality
- Plugin registration pattern
- Platform adapter interface
- Memory routing across services
Identify the 3 biggest architectural weaknesses (every framework has them)

Day 5 (Friday): Integration Project — Deploy a Week 1-6 Agent

The real test: Take an agent you built in Weeks 1-6 and deploy it to at least one chat platform using patterns learned from ElizaOS.

Options (pick one):

Pydantic-AI agent → Discord: Take your structured-output agent from Week 1 and wrap it in a Discord bot using ElizaOS's adapter patterns (or build your own minimal adapter inspired by their architecture)
GPT Researcher → Telegram: Take your research agent from Week 4 and make it accessible via Telegram — users send a topic, agent researches and responds
Multi-framework pipeline → Discord: Take your Week 6 MetaGPT multi-agent setup and expose it through a Discord interface where users can kick off the SOP workflow

Homework:

Deploy a previously-built agent to a real chat platform — it must respond to real messages
Write a retrospective for ElizaOS:
- Strengths: What does it do better than building your own deployment layer?
- Weaknesses: Where is it limited or frustrating?
- When to use: What type of project benefits most from ElizaOS?
- When to skip: When is it overkill or the wrong tool?
Update the comparison matrix with the ElizaOS column
Answer: "If I were building a production agent for a client, would I use ElizaOS for deployment or roll my own? Why?"

Key Questions You Should Be Able to Answer After Week 7

How does ElizaOS's character system differ from hardcoding agent personalities?
What's the plugin registration lifecycle — from Plugin definition to runtime availability?
How would you add a completely new platform (e.g., Slack, WhatsApp) to ElizaOS?
What are the trade-offs of a deployment-platform approach vs building bespoke platform integrations?
How does multi-platform memory work — and where does it break down?
When is ElizaOS the right choice vs a simple Discord.js bot?
What deployment patterns from ElizaOS would you steal for a custom agent pipeline?

Week 8: Capstone Project

Timeline: 1 week | Difficulty: ⭐⭐⭐⭐⭐ | Goal: Synthesize learnings from 3+ frameworks

The Project: "Research → Analyze → Act" Pipeline

Build a system that combines at least 3 of the frameworks you studied:

Recommended Architecture

┌─────────────────────────────────────────────────────────┐
│                    Capstone Pipeline                      │
│                                                          │
│  ┌──────────────┐    ┌──────────────┐    ┌────────────┐ │
│  │ GPT          │    │ Pydantic-AI  │    │ MetaGPT OR │ │
│  │ Researcher   │───▶│ Structured   │───▶│ MS Agent   │ │
│  │ (Research)   │    │ Analysis     │    │ Framework  │ │
│  │              │    │ Agent        │    │ (Execute)  │ │
│  └──────────────┘    └──────────────┘    └────────────┘ │
│                                                          │
│  Optional additions:                                     │
│  - Agent-S for browser automation during research        │
│  - Yao for scheduling periodic re-research               │
└─────────────────────────────────────────────────────────┘

Requirements

Stage 1: Research — Use GPT Researcher to conduct deep research on a topic
Stage 2: Analysis — Use Pydantic-AI to process research into structured data with validated output types
Stage 3: Action — Use MetaGPT's multi-agent SOP OR MS Agent Framework's graph workflow to generate deliverables from the structured analysis
Integration: The output of one stage must be the input to the next
Documentation: Write a README explaining your architecture and design decisions

Stretch Goals

Add a Yao scheduled trigger so the pipeline runs daily/weekly
Deploy the entire pipeline to Discord/Telegram using ElizaOS patterns from Week 7
Add observability (Logfire or OpenTelemetry)
Add a web UI (even simple HTML)
Use MCP to connect components
Add Agent-S for any browser automation steps

Deliverables

Working code at ~/agent-study/capstone/
README.md with architecture diagram and setup instructions
DECISIONS.md explaining why you chose each framework for each stage
RETROSPECTIVE.md — final thoughts on the 7-week journey

Appendix: Comparison Matrix Template

Save this at ~/agent-study/comparison-matrix/matrix.md and fill it in weekly:

# AI Agent Framework Comparison Matrix

| Dimension | Pydantic-AI | MS Agent Framework | Agent-S | GPT Researcher | Yao | MetaGPT | ElizaOS |
|-----------|-------------|-------------------|---------|----------------|-----|---------|---------|
| **Language** | Python | Python + .NET | Python | Python | Go | Python | TypeScript |
| **Stars** | 14.6k | 7k | 9.6k | 25k | 7.5k | 63k | 17k |
| **Agent Definition** | | | | | | | |
| **Tool Integration** | | | | | | | |
| **Multi-Agent Coord.** | | | | | | | |
| **Error Handling** | | | | | | | |
| **Observability** | | | | | | | |
| **Type Safety** | | | | | | | |
| **DX / Ergonomics** | | | | | | | |
| **Production Readiness** | | | | | | | |
| **Unique Superpower** | | | | | | | |
| **Biggest Weakness** | | | | | | | |
| **Best Use Case** | | | | | | | |
| **Would I Use For...** | | | | | | | |
| **Overall Rating (1-10)** | | | | | | | |

📊 Week-by-Week Schedule Overview

Week	Framework	Focus	Difficulty	Key Deliverables
0	Prep	Setup & background reading	⭐	Environment ready, papers skimmed
1	Pydantic-AI	Type-safe agents, DI, structured output	⭐⭐	Architecture doc, 3 agents, steal doc
2	MS Agent Framework	Graph workflows, DevUI, enterprise patterns	⭐⭐⭐	Graph workflow, DevUI screenshots, steal doc
3	Agent-S	Computer use, visual grounding, screenshots	⭐⭐⭐⭐	Computer use demo, architecture analysis
4	GPT Researcher	Deep research, Plan-and-Solve, RAG	⭐⭐	Research agent, MCP integration
5	Yao	Event-driven agents, Go, single binary, GraphRAG	⭐⭐⭐⭐	Event-driven agent, DSL exploration
6	MetaGPT	SOPs, multi-agent teams, roles/actions	⭐⭐⭐	Multi-agent SOP, comparison matrix
7	ElizaOS	Deployment, multi-platform distribution, plugins	⭐⭐	Multi-platform agent, custom plugin, deploy a Week 1-6 agent
8	Capstone	Integrate 3+ frameworks	⭐⭐⭐⭐⭐	Working pipeline, docs, retrospective

🏁 Success Criteria

After completing this study plan, you should be able to:

Explain the architecture of each framework from memory (whiteboard test)
Build a production-grade agent with Pydantic-AI from scratch
Design a graph workflow for a complex multi-step process
Understand computer-use agent architecture and its limitations
Implement a Plan-and-Solve research pipeline
Compare event-driven vs request-response agent architectures
Deploy an agent to Discord/Telegram and understand multi-platform routing patterns
Choose the right framework for a given problem with clear reasoning
Read any agent framework's source code and quickly identify its core abstractions

"The goal isn't to memorize APIs. It's to build intuition for how agent systems are designed, so you can build your own or extend existing ones with confidence."

Generated by Clawdbot | February 4, 2026

71 KiB Raw Blame History

🧠 AI Agent Frameworks — 8-Week Deep Study Plan

📋 Table of Contents

Week 0: Prep & Prerequisites

Environment Setup

API Keys & Accounts

Workspace Setup

Background Reading (1-2 hours)

Mental Model to Build

Week 1: Pydantic-AI

Why This Is Week 1

Resources

🗂 Source Code Guide — "Read THESE Files"

Day 1 (Monday): Architecture Deep Dive

Day 2 (Tuesday): Hello World + Core Concepts

Day 3 (Wednesday): Intermediate Build — Structured Output + DI

Day 4 (Thursday): Advanced Patterns + Source Code Reading

Day 5 (Friday): Integration Project + Reflection

🎯 Key Questions — You Should Be Able to Answer:

Week 2: Microsoft Agent Framework

Why This Is Week 2

Resources

🗂 Source Code Guide

Day 1 (Monday): Architecture Deep Dive

Day 2 (Tuesday): Hello World + Core Concepts

Day 3 (Wednesday): Intermediate Build — Graph Workflows

Day 4 (Thursday): Advanced Patterns + Source Code Reading

Day 5 (Friday): Integration Project + Reflection

🎯 Key Questions:

Week 3: Agent-S

Why This Is Week 3

Resources

🗂 Source Code Guide

Day 1 (Monday): Architecture Deep Dive

Day 2 (Tuesday): Hello World + Core Concepts

Day 3 (Wednesday): Intermediate Build — Understanding Computer Use

Day 4 (Thursday): Advanced Patterns + Source Code Reading

Day 5 (Friday): Integration Project + Reflection

🎯 Key Questions:

Week 4: GPT Researcher

Why This Is Week 4

Resources

🗂 Source Code Guide

Day 1 (Monday): Architecture Deep Dive

Day 2 (Tuesday): Hello World + Core Concepts

Day 3 (Wednesday): Intermediate Build — Deep Research + MCP

Day 4 (Thursday): Advanced Patterns + Source Code Reading

Day 5 (Friday): Integration Project + Reflection

🎯 Key Questions:

Week 5: Yao

Why This Is Week 5

Resources

🗂 Source Code Guide

Day 1 (Monday): Architecture Deep Dive

Day 2 (Tuesday): Hello World + Core Concepts

Day 3 (Wednesday): Intermediate Build — Event-Driven Agents

Day 4 (Thursday): Advanced Patterns + Source Code Reading

Day 5 (Friday): Integration Project + Reflection

🎯 Key Questions:

Week 6: MetaGPT

Why This Is Week 6

Resources

🗂 Source Code Guide

Day 1 (Monday): Architecture Deep Dive

Day 2 (Tuesday): Hello World + Core Concepts

Day 3 (Wednesday): Intermediate Build — Multi-Agent SOPs

Day 4 (Thursday): Advanced Patterns + Source Code Reading

Day 5 (Friday): Integration Project + Final Reflection

🎯 Key Questions:

Week 7: ElizaOS

Why ElizaOS Makes The Cut

Resources

Key Source Files to Read

Day 1 (Monday): Architecture Deep Dive — The Deployment Platform

Day 2 (Tuesday): Hello World — Deploy an Agent to Discord

Day 3 (Wednesday): Multi-Platform + Plugin System

Day 4 (Thursday): Source Code Reading + Advanced Patterns

Day 5 (Friday): Integration Project — Deploy a Week 1-6 Agent

Key Questions You Should Be Able to Answer After Week 7

Week 8: Capstone Project

71 KiB

Raw Blame History