clawdbot-workspace/agent-repos-study-plan.md
2026-02-04 23:01:37 -05:00

1498 lines
71 KiB
Markdown

# 🧠 AI Agent Frameworks — 8-Week Deep Study Plan
> **Goal:** Go from "I've heard of these" to "I could build & deploy production systems with these" in 8 weeks.
> **Time commitment:** ~1-2 hours/day, Mon-Fri
> **Based on:** [Trending Repos Deep Dive Analysis](./trending-repos-deep-dive.md) (Feb 2026)
> **Last updated:** February 4, 2026
---
## 📋 Table of Contents
- [Week 0: Prep & Prerequisites](#week-0-prep--prerequisites)
- [Week 1: Pydantic-AI](#week-1-pydantic-ai) — The Production SDK ⭐⭐
- [Week 2: Microsoft Agent Framework](#week-2-microsoft-agent-framework) — Enterprise Orchestration ⭐⭐⭐
- [Week 3: Agent-S](#week-3-agent-s) — Computer Use Pioneer ⭐⭐⭐⭐
- [Week 4: GPT Researcher](#week-4-gpt-researcher) — Deep Research Agent ⭐⭐
- [Week 5: Yao](#week-5-yao) — Event-Driven Agents in Go ⭐⭐⭐⭐
- [Week 6: MetaGPT](#week-6-metagpt) — Multi-Agent SOP Framework ⭐⭐⭐
- [Week 7: ElizaOS](#week-7-elizaos) — Deployment & Multi-Platform Distribution ⭐⭐
- [Week 8: Capstone Project](#week-8-capstone-project)
- [Appendix: Comparison Matrix Template](#appendix-comparison-matrix-template)
> ⭐ = Difficulty Rating (1-5). More stars = harder week.
---
## Week 0: Prep & Prerequisites
> **Timeline:** The weekend before you start. ~3-4 hours total.
### Environment Setup
- [ ] **Python 3.11+** installed (`python --version`)
- [ ] **Go 1.21+** installed for Week 5 (`go version`)
- [ ] **Node.js 18+** and `pnpm` installed (needed for MetaGPT and Yao)
- [ ] **Docker Desktop** installed and running
- [ ] **Git** configured with SSH keys for cloning repos
- [ ] **VS Code** (or your editor) with Python + Go extensions
- [ ] **A GPU or cloud GPU access** (optional, helps for Agent-S grounding model)
### API Keys & Accounts
- [ ] **OpenAI API key** — used by almost every framework
- [ ] **Anthropic API key** — primary for Pydantic-AI examples
- [ ] **Tavily API key** — required for GPT Researcher (free tier works: [app.tavily.com](https://app.tavily.com))
- [ ] **Azure OpenAI access** — needed for Microsoft Agent Framework (free trial available)
- [ ] **Hugging Face account + token** — needed for Agent-S grounding model
- [ ] **Google API key** — optional, for Gemini-based features in GPT Researcher
### Workspace Setup
```bash
# Create a clean workspace for all 6 weeks
mkdir -p ~/agent-study/{week1-pydantic-ai,week2-ms-agent,week3-agent-s,week4-gpt-researcher,week5-yao,week6-metagpt,capstone}
mkdir -p ~/agent-study/notes
mkdir -p ~/agent-study/comparison-matrix
# Initialize a git repo for your study notes
cd ~/agent-study
git init
echo "# AI Agent Frameworks Study" > README.md
git add . && git commit -m "init study workspace"
```
### Background Reading (1-2 hours)
Read these before Week 1. They're the conceptual foundation:
- [ ] **[Plan-and-Solve Prompting](https://arxiv.org/abs/2305.04091)** — The paper behind GPT Researcher's architecture. Skim the abstract + Section 3.
- [ ] **[RAG paper](https://arxiv.org/abs/2005.11401)** — Core concept used by multiple frameworks. Read abstract + intro.
- [ ] **[Model Context Protocol (MCP) spec](https://modelcontextprotocol.io/)** — Anthropic's protocol for tool integration. Read the overview page.
- [ ] **[Agent2Agent (A2A) protocol](https://google.github.io/A2A/)** — Google's agent interop standard. Skim the spec overview.
- [ ] **[Pydantic docs (crash course)](https://docs.pydantic.dev/latest/concepts/models/)** — If you're rusty on Pydantic, spend 30 min here. It's the foundation of Week 1.
### Mental Model to Build
Every agent framework answers the same 5 questions differently:
1. **How do you define an agent?** (class, function, config, DSL)
2. **How do agents use tools?** (function calling, MCP, code execution)
3. **How do multiple agents coordinate?** (graph, SOP, message passing, events)
4. **How do you handle errors & retries?** (automatic, manual, durable execution)
5. **How do you observe what happened?** (logging, tracing, replay)
Keep these questions in mind every week. By Week 6, you'll have 6 different answers for each.
---
## Week 1: Pydantic-AI
> **Difficulty:** ⭐⭐ (Approachable — excellent docs, familiar Python patterns)
> **Repo:** [github.com/pydantic/pydantic-ai](https://github.com/pydantic/pydantic-ai)
> **Stars:** 14.6k | **Language:** Python | **Version:** v1.52.0+
### Why This Is Week 1
Pydantic-AI is the most ergonomic agent framework and has the best docs. Starting here builds your mental model for how agent SDKs *should* feel. Everything after this week will be compared to Pydantic-AI's developer experience. It's the FastAPI of agents — you'll understand why once you use it.
### Resources
| Resource | Link |
|----------|------|
| 📖 Documentation | [ai.pydantic.dev](https://ai.pydantic.dev/) |
| 💬 Community (Slack) | [Pydantic Slack](https://logfire.pydantic.dev/docs/join-slack/) |
| 📦 PyPI | [pydantic-ai](https://pypi.org/project/pydantic-ai/) |
| 🔭 Observability | [Pydantic Logfire](https://pydantic.dev/logfire) |
| 📝 Blog: How it was built | [Pydantic blog](https://pydantic.dev/articles) |
| 🎥 Intro video | Search "Pydantic AI tutorial 2025" on YouTube |
### 🗂 Source Code Guide — "Read THESE Files"
```
pydantic_ai_slim/pydantic_ai/
├── agent/
│ └── __init__.py # ⭐ THE file. Agent class definition, run(), run_sync(), run_stream()
├── _agent_graph.py # ⭐ Internal agent execution graph — how runs actually execute
├── tools.py # ⭐ Tool decorator, RunContext, tool schema generation
├── result.py # ⭐ RunResult, StreamedRunResult — output handling
├── models/
│ ├── __init__.py # Model ABC — how all model providers implement the same interface
│ ├── openai.py # OpenAI provider implementation
│ └── anthropic.py # Anthropic provider implementation
├── _a2a.py # Agent2Agent protocol integration
├── mcp.py # MCP client/server integration
└── _output.py # Output type handling, Pydantic validation on LLM outputs
```
> **💡 Tip:** Start with `agent/__init__.py`. It's beautifully documented with docstrings. Then read `tools.py` to understand how the `@agent.tool` decorator works. Finally, read `_agent_graph.py` to see how the runtime orchestrates tool calls.
---
### Day 1 (Monday): Architecture Deep Dive
**Read:**
- [ ] The full [README](https://github.com/pydantic/pydantic-ai)
- [ ] Docs: [Introduction](https://ai.pydantic.dev/)
- [ ] Docs: [Agents](https://ai.pydantic.dev/agents)
- [ ] Docs: [Models Overview](https://ai.pydantic.dev/models/overview)
- [ ] Docs: [Tools](https://ai.pydantic.dev/tools)
- [ ] Docs: [Output / Structured Results](https://ai.pydantic.dev/output)
- [ ] Docs: [Dependency Injection](https://ai.pydantic.dev/dependencies) (if exists) or see DI pattern in the bank support example
**Identify core abstractions:**
- `Agent` — the central class (generic over deps + output type)
- `RunContext` — carries dependencies into tool functions
- `Tool` — decorated functions the LLM can call
- `ModelSettings` — per-request model configuration
- `RunResult` / `StreamedRunResult` — typed output containers
**Understand the execution flow:**
```
User prompt → Agent.run() → Model call → [Tool call → Tool execution → Model call]* → Validated output
```
- [ ] **📝 Homework:** Write a 1-page architecture summary at `~/agent-study/notes/week1-architecture.md`
- Cover: Agent lifecycle, dependency injection pattern, how tools are registered and called, how output validation works
- Draw a simple diagram (ASCII or hand-drawn photo is fine)
---
### Day 2 (Tuesday): Hello World + Core Concepts
**Setup:**
```bash
cd ~/agent-study/week1-pydantic-ai
python -m venv .venv && source .venv/bin/activate
pip install pydantic-ai
```
**Run the quickstart:**
```python
from pydantic_ai import Agent
agent = Agent(
'anthropic:claude-sonnet-4-0',
instructions='Be concise, reply with one sentence.',
)
result = agent.run_sync('Where does "hello world" come from?')
print(result.output)
```
**Understand the core API surface:**
- [ ] `agent.run()` vs `agent.run_sync()` vs `agent.run_stream()`
- [ ] How `instructions` work (static string vs dynamic function)
- [ ] How model selection works (string shorthand vs model objects)
- [ ] How `result.output` is typed
- [ ] **📝 Homework:** Build the simplest agent from scratch — NO copy-paste
- Requirements: takes a topic, returns a structured output (use a Pydantic model as the output type)
- Must use at least one custom instruction
- Save at `~/agent-study/week1-pydantic-ai/hello_agent.py`
---
### Day 3 (Wednesday): Intermediate Build — Structured Output + DI
**Focus: Pydantic-AI's killer features — type-safe structured output and dependency injection**
**Work through:**
- [ ] The [bank support agent example](https://ai.pydantic.dev/#tools-dependency-injection-example) from the docs
- [ ] Docs: [Structured Output / Streamed Results](https://ai.pydantic.dev/output#streamed-results)
- [ ] Docs: [Graph Support](https://ai.pydantic.dev/graph)
**Key concepts to grok:**
- How `RunContext[DepsType]` carries typed dependencies
- How Pydantic models as output types create validated, structured responses
- How tool docstrings become the tool description sent to the LLM
- How streaming works with structured output (partial validation!)
- [ ] **📝 Homework:** Build an agent that uses the framework's unique capabilities:
- **Must include:** Dependency injection with a real dependency (database mock, API client, etc.)
- **Must include:** Structured output via a Pydantic model (not just string output)
- **Must include:** At least 2 tools
- Example idea: A "recipe finder" agent with deps for a recipe database, tools for searching and filtering, output as a structured `Recipe` model
- Save at `~/agent-study/week1-pydantic-ai/structured_agent.py`
---
### Day 4 (Thursday): Advanced Patterns + Source Code Reading
**Read these source files (in order):**
1. `pydantic_ai_slim/pydantic_ai/agent/__init__.py` — How `Agent` class is defined, the generic type parameters
2. `pydantic_ai_slim/pydantic_ai/tools.py` — How `@tool` works, schema generation, `RunContext`
3. `pydantic_ai_slim/pydantic_ai/_agent_graph.py` — The internal execution engine
4. `pydantic_ai_slim/pydantic_ai/result.py` — How results are wrapped, streamed, validated
5. `pydantic_ai_slim/pydantic_ai/models/__init__.py` — The model provider ABC
**Understand:**
- [ ] How errors from tool execution are passed back to the LLM for retry
- [ ] How streaming works internally (incremental Pydantic validation)
- [ ] How the `_agent_graph.py` orchestrates the conversation loop
- [ ] How durable execution checkpoints work
**Explore advanced features:**
- [ ] Docs: [Durable Execution](https://ai.pydantic.dev/durable_execution/overview/)
- [ ] Docs: [MCP Integration](https://ai.pydantic.dev/mcp/overview)
- [ ] Docs: [Human-in-the-Loop](https://ai.pydantic.dev/deferred-tools)
- [ ] Docs: [Evals](https://ai.pydantic.dev/evals)
- [ ] **📝 Homework:** Write "What I'd Steal from Pydantic-AI" at `~/agent-study/notes/week1-steal.md`
- Focus on: DI pattern, type-safe generics, streaming validation, tool retry pattern
- What design decisions are genius? What would you do differently?
---
### Day 5 (Friday): Integration Project + Reflection
- [ ] **Build a mini-project** that integrates with something real:
- **Suggested:** An agent that queries a real API (weather, GitHub, Hacker News), processes the data through tools, and returns a structured report as a Pydantic model
- **Bonus:** Add Logfire observability (it's free tier) and see the traces
- **Bonus:** Expose it as an MCP server
- Save at `~/agent-study/week1-pydantic-ai/integration_project/`
- [ ] **Write retrospective** at `~/agent-study/notes/week1-retro.md`:
- Strengths of Pydantic-AI
- Weaknesses / gaps you noticed
- When would you reach for this vs building from scratch?
- What surprised you?
- [ ] **Start comparison matrix** at `~/agent-study/comparison-matrix/matrix.md` (see [template](#appendix-comparison-matrix-template))
### 🎯 Key Questions — You Should Be Able to Answer:
1. What does the `Agent` class generic signature `Agent[DepsType, OutputType]` buy you?
2. How does dependency injection work in Pydantic-AI and why is it better than global state?
3. How does Pydantic-AI validate structured output from an LLM that returns free-form text?
4. What happens when a tool call fails? How does the retry loop work?
5. What's the difference between `run()`, `run_sync()`, and `run_stream()`?
6. How would you add a new model provider to Pydantic-AI?
7. What is durable execution and when would you use it?
---
## Week 2: Microsoft Agent Framework
> **Difficulty:** ⭐⭐⭐ (Larger surface area, graph concepts, mono-repo navigation)
> **Repo:** [github.com/microsoft/agent-framework](https://github.com/microsoft/agent-framework)
> **Stars:** 7k | **Languages:** Python + .NET | **Born from:** Semantic Kernel + AutoGen
### Why This Is Week 2
If Pydantic-AI is the developer's choice, Microsoft Agent Framework is the enterprise's choice. It introduces graph-based workflows — a fundamentally different orchestration model from the simple agent loop you learned in Week 1. Understanding this framework means understanding where corporate AI agent development is heading.
### Resources
| Resource | Link |
|----------|------|
| 📖 Documentation | [learn.microsoft.com/agent-framework](https://learn.microsoft.com/en-us/agent-framework/) |
| 🚀 Quick Start | [Quick Start Tutorial](https://learn.microsoft.com/agent-framework/tutorials/quick-start) |
| 💬 Discord | [Discord](https://discord.gg/b5zjErwbQM) |
| 🎥 Intro Video (30 min) | [YouTube](https://www.youtube.com/watch?v=AAgdMhftj8w) |
| 🎥 DevUI Demo (1 min) | [YouTube](https://www.youtube.com/watch?v=mOAaGY4WPvc) |
| 📦 PyPI | [agent-framework](https://pypi.org/project/agent-framework/) |
| 📝 Migration from SK | [Semantic Kernel Migration](https://learn.microsoft.com/en-us/agent-framework/migration-guide/from-semantic-kernel) |
| 📝 Migration from AutoGen | [AutoGen Migration](https://learn.microsoft.com/en-us/agent-framework/migration-guide/from-autogen) |
### 🗂 Source Code Guide
```
python/packages/
├── agent-framework/ # ⭐ Core package — agents, middleware, workflows
│ └── src/agent_framework/
│ ├── agents/ # Agent base classes and implementations
│ ├── workflows/ # ⭐ Graph-based workflow engine
│ └── middleware/ # ⭐ Request/response middleware pipeline
├── azure-ai/ # Azure AI provider (Responses API)
├── openai/ # OpenAI provider
├── anthropic/ # Anthropic provider
├── devui/ # ⭐ Developer UI for debugging workflows
├── mcp/ # MCP integration
├── a2a/ # Agent2Agent protocol
└── lab/ # Experimental features (benchmarking, RL)
python/samples/getting_started/
├── agents/ # ⭐ Start here — basic agent examples
├── workflows/ # ⭐ Graph workflow examples (critical!)
├── middleware/ # Middleware examples
└── observability/ # OpenTelemetry integration
```
> **💡 Tip:** This is a mono-repo. Don't try to read everything. Focus on `python/packages/agent-framework/` for the core, and `python/samples/getting_started/workflows/` for the graph workflow examples.
---
### Day 1 (Monday): Architecture Deep Dive
**Read:**
- [ ] [Overview](https://learn.microsoft.com/agent-framework/overview/agent-framework-overview)
- [ ] The full [README](https://github.com/microsoft/agent-framework)
- [ ] [User Guide Overview](https://learn.microsoft.com/en-us/agent-framework/user-guide/overview)
- [ ] Watch the [30-min intro video](https://www.youtube.com/watch?v=AAgdMhftj8w) (at 1.5x speed)
- [ ] Skim the [SK migration guide](https://learn.microsoft.com/en-us/agent-framework/migration-guide/from-semantic-kernel) to understand lineage
**Identify core abstractions:**
- `Agent` — base agent interface
- `Workflow` / `Graph` — the graph-based orchestration system
- `Middleware` — request/response processing pipeline
- `AgentProvider` — LLM provider abstraction
- `DevUI` — visual debugging tool
**Key architectural insight:** This framework uses a **data-flow graph** model where nodes are agents or functions, and edges carry data between them. This is fundamentally different from Pydantic-AI's linear agent loop.
- [ ] **📝 Homework:** Write a 1-page architecture summary at `~/agent-study/notes/week2-architecture.md`
- Compare the graph workflow model to Pydantic-AI's linear model
- Draw the graph workflow concept (nodes = agents/functions, edges = data flow)
---
### Day 2 (Tuesday): Hello World + Core Concepts
**Setup:**
```bash
cd ~/agent-study/week2-ms-agent
python -m venv .venv && source .venv/bin/activate
pip install agent-framework --pre
# You'll need Azure credentials or an OpenAI key
```
**Run the quickstart:**
```python
import asyncio
from agent_framework.openai import OpenAIChatClient
async def main():
agent = OpenAIChatClient(
api_key="your-key"
).as_agent(
name="HaikuBot",
instructions="You are an upbeat assistant that writes beautifully.",
)
print(await agent.run("Write a haiku about AI agents."))
asyncio.run(main())
```
**Understand:**
- [ ] `as_agent()` pattern — how providers become agents
- [ ] The difference between Chat agents and Responses agents
- [ ] How the Python API differs from the .NET API (skim a .NET example)
- [ ] **📝 Homework:** Build the simplest agent from scratch — NO copy-paste
- Save at `~/agent-study/week2-ms-agent/hello_agent.py`
---
### Day 3 (Wednesday): Intermediate Build — Graph Workflows
**This is the key differentiator. This is the day that matters.**
**Work through:**
- [ ] `python/samples/getting_started/workflows/` — all examples
- [ ] Docs: Workflow/Graph tutorials on learn.microsoft.com
- [ ] Understand streaming, checkpointing, and time-travel in graphs
**Key concepts:**
- How nodes in a graph can be agents OR deterministic functions
- How data flows between nodes via typed edges
- How checkpointing enables pause/resume of long-running workflows
- How human-in-the-loop fits into the graph model
- How time-travel lets you replay/debug workflows
- [ ] **📝 Homework:** Build a graph workflow:
- **Must include:** At least 3 nodes (mix of agent nodes and function nodes)
- **Must include:** Branching logic (conditional edges)
- Example idea: A "content pipeline" — Node 1 (agent: research a topic) → Node 2 (function: format research) → Node 3 (agent: write blog post) with a branch for "needs more research"
- Save at `~/agent-study/week2-ms-agent/graph_workflow.py`
---
### Day 4 (Thursday): Advanced Patterns + Source Code Reading
**Read these source files:**
1. Core agent base classes in `python/packages/agent-framework/`
2. Workflow/graph engine implementation
3. Middleware pipeline implementation
4. DevUI package structure
5. At least one provider implementation (OpenAI or Azure)
**Explore:**
- [ ] Set up and run the **DevUI** — visualize your graph workflow from Day 3
- [ ] Look at the **OpenTelemetry integration**`python/samples/getting_started/observability/`
- [ ] Read the **middleware examples** — understand the request/response pipeline
- [ ] Check out the **lab package** — what's experimental?
- [ ] **📝 Homework:** Write "What I'd Steal from MS Agent Framework" at `~/agent-study/notes/week2-steal.md`
- Focus on: Graph workflow model, DevUI concept, middleware pipeline, multi-language support
- Compare to Pydantic-AI: when would you choose one over the other?
---
### Day 5 (Friday): Integration Project + Reflection
- [ ] **Build a mini-project:**
- **Suggested:** A multi-step data processing pipeline using graph workflows
- Must have: at least one agent node calling an LLM, at least one pure function node, checkpointing enabled
- **Bonus:** Get the DevUI running and screenshot your workflow visualization
- Save at `~/agent-study/week2-ms-agent/integration_project/`
- [ ] **Write retrospective** at `~/agent-study/notes/week2-retro.md`
- [ ] **Update comparison matrix** — add MS Agent Framework entry
### 🎯 Key Questions:
1. What's the difference between a linear agent loop and a graph-based workflow?
2. How does checkpointing work in MS Agent Framework workflows?
3. What does "time-travel" mean in the context of agent debugging?
4. How does the middleware pipeline work and when would you use it?
5. What's the DevUI and what can you debug with it that you can't with logs alone?
6. How does this framework's agent abstraction compare to Pydantic-AI's `Agent` class?
7. When would you choose MS Agent Framework over Pydantic-AI? (Think: team size, workflow complexity, language requirements)
---
## Week 3: Agent-S
> **Difficulty:** ⭐⭐⭐⭐ (Requires GPU for grounding model, novel paradigm, research-grade code)
> **Repo:** [github.com/simular-ai/Agent-S](https://github.com/simular-ai/Agent-S)
> **Stars:** 9.6k | **Language:** Python | **Papers:** ICLR 2025, COLM 2025
### Why This Is Week 3
This is a completely different paradigm. Weeks 1-2 were about agents that work with APIs and text. Agent-S works with **pixels and clicks** — it uses your computer like a human does. This is the frontier of agent development. Understanding Agent-S means understanding where computer-use agents are heading.
### Resources
| Resource | Link |
|----------|------|
| 📖 Repo | [github.com/simular-ai/Agent-S](https://github.com/simular-ai/Agent-S) |
| 💬 Discord | [Discord](https://discord.gg/E2XfsK9fPV) |
| 📄 S1 Paper (ICLR 2025) | [arxiv.org/abs/2410.08164](https://arxiv.org/abs/2410.08164) |
| 📄 S2 Paper (COLM 2025) | [arxiv.org/abs/2504.00906](https://arxiv.org/abs/2504.00906) |
| 📄 S3 Paper | [arxiv.org/abs/2510.02250](https://arxiv.org/abs/2510.02250) |
| 🌐 S3 Blog | [simular.ai/articles/agent-s3](https://www.simular.ai/articles/agent-s3) |
| 🎥 S3 Video | [YouTube](https://www.youtube.com/watch?v=VHr0a3UBsh4) |
| 📦 PyPI | [gui-agents](https://pypi.org/project/gui-agents/) |
| 🤗 Grounding Model | [UI-TARS-1.5-7B](https://huggingface.co/ByteDance-Seed/UI-TARS-1.5-7B) |
### 🗂 Source Code Guide
```
gui_agents/
├── s3/ # ⭐ Latest version — start here
│ ├── cli_app.py # ⭐ Entry point — CLI application, main loop
│ ├── agents/ # ⭐ Agent implementations (planning, grounding, execution)
│ ├── core/ # ⭐ Core abstractions (screenshot, actions, state)
│ ├── bbon/ # Behavior Best-of-N — sampling strategy for better performance
│ └── prompts/ # System prompts for each agent role
├── s2/ # Previous version
├── s2_5/ # Intermediate version
├── s1/ # Original version (ICLR 2025)
└── utils.py # Shared utilities
```
> **💡 Tip:** Focus entirely on `gui_agents/s3/`. Read the papers' system diagrams first, THEN the code. The code makes 10x more sense with the paper's architecture diagram in front of you.
> **⚠️ Setup Note:** Agent-S requires a grounding model (UI-TARS-1.5-7B). You can host it on Hugging Face Inference Endpoints (~$1-2/hr for A10G), use a free tier if available, or run it locally if you have a capable GPU (16GB+ VRAM). Alternatively, study the code architecture without running the full system.
---
### Day 1 (Monday): Architecture Deep Dive
**Read:**
- [ ] The full [README](https://github.com/simular-ai/Agent-S)
- [ ] [S3 blog post](https://www.simular.ai/articles/agent-s3) — accessible overview
- [ ] **S1 Paper** (at least abstract + Sections 1-3) — core architecture concepts
- [ ] **S3 Paper** (abstract + architecture section) — latest improvements
- [ ] `models.md` in the repo — supported model configurations
**Identify core abstractions:**
- **Screenshot Capture** — the agent "sees" the screen as an image
- **Grounding Model** (UI-TARS) — converts screenshots to UI element locations
- **Planning Agent** — decides what to do based on current screen + goal
- **Execution Agent** — translates plans into mouse/keyboard actions
- **Behavior Best-of-N (bBoN)** — run multiple rollouts, pick the best
**The pipeline:**
```
Task → Screenshot → Grounding (UI-TARS: identify elements) → Planning (LLM: what to do) → Action (click/type/scroll) → New Screenshot → Loop
```
- [ ] **📝 Homework:** Write architecture summary at `~/agent-study/notes/week3-architecture.md`
- Include the screenshot→grounding→planning→action pipeline
- Explain bBoN and why it matters (72.6% vs 66% on OSWorld)
- Compare: how is "seeing" a screen different from "calling" an API?
---
### Day 2 (Tuesday): Hello World + Core Concepts
**Setup:**
```bash
cd ~/agent-study/week3-agent-s
python -m venv .venv && source .venv/bin/activate
pip install gui-agents
brew install tesseract # Required dependency
```
**API configuration:**
```bash
export OPENAI_API_KEY=<your-key>
export ANTHROPIC_API_KEY=<your-key>
export HF_TOKEN=<your-huggingface-token>
```
**Run Agent-S3 (if you have grounding model access):**
```bash
agent_s \
--provider openai \
--model gpt-4o \
--ground_provider huggingface \
--ground_url <your-endpoint-url> \
--ground_model ui-tars-1.5-7b \
--grounding_width 1920 \
--grounding_height 1080
```
> **If you can't run it:** Read through `gui_agents/s3/cli_app.py` line by line and trace the execution flow. Understand what WOULD happen at each step.
- [ ] **📝 Homework:** Even if you can't run the full agent, build a minimal screenshot → analysis script:
```python
# Take a screenshot, send it to a vision model, get a description of UI elements
# This exercises the same "visual grounding" concept, just simplified
```
- Save at `~/agent-study/week3-agent-s/hello_agent.py`
---
### Day 3 (Wednesday): Intermediate Build — Understanding Computer Use
**Work through:**
- [ ] Read `gui_agents/s3/agents/` — understand the multi-agent architecture
- [ ] Read `gui_agents/s3/core/` — how screenshots are captured and actions are executed
- [ ] Study the prompt templates in `gui_agents/s3/` — how the LLM is instructed
- [ ] Understand the bBoN strategy in `gui_agents/s3/bbon/`
**Key concepts:**
- How screenshots are processed and annotated for the LLM
- How the grounding model converts visual elements to coordinates
- How actions (click, type, scroll) are executed on the OS level
- Cross-platform differences (Linux/Mac/Windows)
- The local coding environment feature
- [ ] **📝 Homework:** Build something that uses the computer-use paradigm:
- **Option A (with GPU):** Give Agent-S a simple task (open a browser, search for something, copy a result)
- **Option B (without GPU):** Build a simplified "screen reader" agent that takes a screenshot, uses a vision model to understand the UI, and outputs a structured description of what's on screen + suggested next actions
- Save at `~/agent-study/week3-agent-s/computer_use_demo/`
---
### Day 4 (Thursday): Advanced Patterns + Source Code Reading
**Read these source files (in order):**
1. `gui_agents/s3/cli_app.py` — Main entry point, execution loop
2. `gui_agents/s3/agents/` — Each agent role (planner, executor, grounding)
3. `gui_agents/s3/core/` — Screenshot capture, action execution, state management
4. `gui_agents/s3/bbon/` — Behavior Best-of-N implementation
5. `gui_agents/s1/` (briefly) — Compare S1 architecture to S3 to see evolution
**Explore the papers' techniques:**
- [ ] How does "experience-augmented hierarchical planning" work? (S1)
- [ ] What's the "Mixture of Grounding" approach? (S2)
- [ ] How does S3 achieve simplicity while improving performance?
- [ ] **📝 Homework:** Write "What I'd Steal from Agent-S" at `~/agent-study/notes/week3-steal.md`
- Focus on: The screenshot→grounding→action pipeline, bBoN strategy, cross-platform abstractions
- Think about: Could you add computer-use capabilities to a Pydantic-AI agent as a tool?
---
### Day 5 (Friday): Integration Project + Reflection
- [ ] **Build a mini-project:**
- **Suggested:** A "screen monitoring" agent that periodically screenshots your desktop, uses a vision model to understand what's happening, and logs structured summaries (using Pydantic-AI for the structured output!)
- **Alternative:** Build a browser automation agent using Playwright + vision model (a simplified version of Agent-S's approach)
- Save at `~/agent-study/week3-agent-s/integration_project/`
- [ ] **Write retrospective** at `~/agent-study/notes/week3-retro.md`
- [ ] **Update comparison matrix**
### 🎯 Key Questions:
1. What is the screenshot → grounding → action pipeline and why is it powerful?
2. Why does Agent-S need a separate grounding model (UI-TARS) in addition to the planning LLM?
3. What is Behavior Best-of-N and how does it improve performance by ~6%?
4. How is computer-use fundamentally different from API-based agent frameworks?
5. What are the security implications of an agent that can control your mouse and keyboard?
6. What's the difference between Agent-S's approach and Anthropic's Computer Use or OpenAI's Operator?
7. When would you use computer-use agents vs. API-based agents? Give 3 examples of each.
---
## Week 4: GPT Researcher
> **Difficulty:** ⭐⭐ (Straightforward architecture, well-documented, familiar patterns)
> **Repo:** [github.com/assafelovic/gpt-researcher](https://github.com/assafelovic/gpt-researcher)
> **Stars:** 25k | **Language:** Python
### Why This Is Week 4
After 3 weeks of studying *how* agents work internally, this week is about studying a *complete, purpose-built* agent that does one thing extremely well: research. GPT Researcher is the best example of the "Plan-and-Solve + RAG" pattern — a design you'll reuse in your own projects.
### Resources
| Resource | Link |
|----------|------|
| 📖 Documentation | [docs.gptr.dev](https://docs.gptr.dev/docs/gpt-researcher/getting-started) |
| 💬 Discord | [Discord](https://discord.gg/QgZXvJAccX) |
| 📦 PyPI | [gpt-researcher](https://pypi.org/project/gpt-researcher/) |
| 📝 Blog: How it was built | [docs.gptr.dev/blog](https://docs.gptr.dev/blog/building-gpt-researcher) |
| 🎥 Demo | [YouTube](https://www.youtube.com/watch?v=f60rlc_QCxE) |
| 🔧 MCP Integration | [MCP Guide](https://docs.gptr.dev/docs/gpt-researcher/retrievers/mcp-configs) |
| 📜 Plan-and-Solve Paper | [arxiv.org/abs/2305.04091](https://arxiv.org/abs/2305.04091) |
### 🗂 Source Code Guide
```
gpt_researcher/
├── agent.py # ⭐ THE file. GPTResearcher class — the entire research orchestration
├── actions/ # ⭐ Research actions (generate questions, search, scrape, synthesize)
│ ├── query_processing.py # How research questions are generated from the user query
│ ├── web_search.py # Web search execution
│ └── report_generation.py # Final report synthesis
├── config/ # Configuration management
│ └── config.py # All configurable parameters
├── context/ # ⭐ Context management — how gathered info is stored/retrieved
│ └── compression.py # How context is compressed to fit token limits
├── document/ # Document processing (PDF, web pages, etc.)
├── memory/ # ⭐ Research memory — how the agent remembers what it's found
├── orchestrator/ # ⭐ Deep research — recursive tree exploration
│ └── agent/ # Sub-agents for deep research mode
├── retrievers/ # ⭐ Web/local search implementations (Tavily, DuckDuckGo, MCP, etc.)
└── scraper/ # Web scraping implementations
```
> **💡 Tip:** `agent.py` is the heart. It's one file, ~700 lines, and it contains the entire research orchestration. Read it top to bottom. Then read `actions/` to understand each step.
---
### Day 1 (Monday): Architecture Deep Dive
**Read:**
- [ ] Full [README](https://github.com/assafelovic/gpt-researcher)
- [ ] [How it was built](https://docs.gptr.dev/blog/building-gpt-researcher) — the design blog post
- [ ] [Getting Started](https://docs.gptr.dev/docs/gpt-researcher/getting-started)
- [ ] [Customization docs](https://docs.gptr.dev/docs/gpt-researcher/gptr/config)
**Understand the Plan-and-Solve architecture:**
```
User Query
→ Planner Agent: Generate N research questions
→ For each question:
→ Crawler Agent: Search web, gather sources
→ Summarizer: Extract relevant info from each source
→ Source tracker: Track citations
→ Publisher Agent: Aggregate all findings into a report
```
**Deep Research mode adds recursion:**
```
User Query → Generate sub-topics → For each sub-topic → Generate deeper sub-topics → ... → Aggregate bottom-up
```
- [ ] **📝 Homework:** Write architecture summary at `~/agent-study/notes/week4-architecture.md`
---
### Day 2 (Tuesday): Hello World + Core Concepts
**Setup:**
```bash
cd ~/agent-study/week4-gpt-researcher
python -m venv .venv && source .venv/bin/activate
pip install gpt-researcher
# Set required API keys
export OPENAI_API_KEY=<your-key>
export TAVILY_API_KEY=<your-key>
```
**Run the simplest version:**
```python
from gpt_researcher import GPTResearcher
import asyncio
async def main():
query = "What are the latest advancements in AI agent frameworks in 2025-2026?"
researcher = GPTResearcher(query=query)
research_result = await researcher.conduct_research()
report = await researcher.write_report()
print(report)
asyncio.run(main())
```
**Also try the web UI:**
```bash
git clone https://github.com/assafelovic/gpt-researcher.git
cd gpt-researcher
pip install -r requirements.txt
python -m uvicorn main:app --reload
# Visit http://localhost:8000
```
- [ ] **📝 Homework:** Build a minimal research agent from scratch — NO copy-paste
- Save at `~/agent-study/week4-gpt-researcher/hello_researcher.py`
---
### Day 3 (Wednesday): Intermediate Build — Deep Research + MCP
**Focus: GPT Researcher's key differentiators — Deep Research mode and MCP integration**
**Work through:**
- [ ] [Deep Research docs](https://docs.gptr.dev/docs/gpt-researcher/gptr/deep-research)
- [ ] [MCP Integration Guide](https://docs.gptr.dev/docs/gpt-researcher/retrievers/mcp-configs)
- [ ] [Local document research](https://docs.gptr.dev/docs/gpt-researcher/gptr/local-docs)
- [ ] Run a Deep Research query and observe the recursive tree exploration
**Key concepts:**
- How Deep Research recursively explores sub-topics
- How MCP connects GPT Researcher to external data sources
- How context compression prevents token limit issues
- How source tracking and citations work
- The difference between web research and local document research
- [ ] **📝 Homework:** Build a research agent that uses GPT Researcher's unique capabilities:
- **Must include:** MCP integration with at least one external source (e.g., GitHub MCP server)
- **OR:** Research over local documents (PDFs, markdown files from your study notes)
- **Bonus:** Use Deep Research mode for a complex topic
- Save at `~/agent-study/week4-gpt-researcher/deep_research_demo.py`
---
### Day 4 (Thursday): Advanced Patterns + Source Code Reading
**Read these source files (in order):**
1. `gpt_researcher/agent.py` — The entire GPTResearcher class, top to bottom
2. `gpt_researcher/actions/query_processing.py` — How research questions are generated
3. `gpt_researcher/context/compression.py` — How context is managed within token limits
4. `gpt_researcher/orchestrator/` — Deep research recursive tree implementation
5. `gpt_researcher/retrievers/` — How different search providers are integrated
**Understand:**
- [ ] How the planner decomposes a query into research questions
- [ ] How the agent handles rate limiting and API failures
- [ ] How context compression works (this is critical for long research)
- [ ] How the orchestrator manages the recursive tree in Deep Research mode
- [ ] How the report generator synthesizes multiple sources into a coherent report
- [ ] **📝 Homework:** Write "What I'd Steal from GPT Researcher" at `~/agent-study/notes/week4-steal.md`
- Focus on: Plan-and-Solve decomposition, context compression, source tracking, recursive exploration
- Compare: how would you build "deep research" capability into a Pydantic-AI agent?
---
### Day 5 (Friday): Integration Project + Reflection
- [ ] **Build a mini-project:**
- **Suggested:** A "competitive analysis" agent — given a company/product, it researches competitors, pricing, features, and generates a structured comparison report. Use GPT Researcher's engine + Pydantic-AI for structured output.
- **Alternative:** Install GPT Researcher as a [Claude Skill](https://skills.sh/assafelovic/gpt-researcher/gpt-researcher) and use it in your Claude workflow
- Save at `~/agent-study/week4-gpt-researcher/integration_project/`
- [ ] **Write retrospective** at `~/agent-study/notes/week4-retro.md`
- [ ] **Update comparison matrix**
### 🎯 Key Questions:
1. What is the Plan-and-Solve pattern and how does GPT Researcher implement it?
2. How does Deep Research differ from regular research? Draw the tree structure.
3. How does context compression prevent token limit issues during long research?
4. How does GPT Researcher track and cite sources?
5. What search providers does GPT Researcher support and how do you add a new one?
6. How could you combine GPT Researcher with Pydantic-AI for structured research outputs?
7. What are the limitations of automated research (hallucination, bias, recency)?
---
## Week 5: Yao
> **Difficulty:** ⭐⭐⭐⭐ (Go language, novel architecture, less documentation, paradigm shift)
> **Repo:** [github.com/YaoApp/yao](https://github.com/YaoApp/yao)
> **Stars:** 7.5k | **Language:** Go | **Runtime:** Single binary with V8 engine
### Why This Is Week 5
Yao is the most architecturally unique repo in the entire study. It's not a chatbot framework — it's an **autonomous agent engine** where agents are triggered by events, schedules, and emails. This is the only Go-based framework, the only one with event-driven architecture, and the only one that deploys as a single binary. If everything else is "AI assistant," Yao is "AI team member."
> **⚠️ Language Note:** This week requires Go. If you don't know Go, spend an extra hour on Day 1 doing the [Go Tour](https://go.dev/tour/). You don't need to be fluent — just enough to read the source code.
### Resources
| Resource | Link |
|----------|------|
| 🏠 Homepage | [yaoapps.com](https://yaoapps.com) |
| 📖 Documentation | [yaoapps.com/docs](https://yaoapps.com/docs) |
| 🚀 Quick Start | [Getting Started](https://yaoapps.com/docs/documentation/en-us/getting-started) |
| ✨ Why Yao? | [Why Yao](https://yaoapps.com/docs/documentation/en-us/getting-started/why-yao) |
| 🤖 Agent Examples | [YaoAgents/awesome](https://github.com/YaoAgents/awesome) |
| 📦 Install Script | `curl -fsSL https://yaoapps.com/install.sh \| bash` |
| 🐹 Go Tour (if needed) | [go.dev/tour](https://go.dev/tour/) |
### 🗂 Source Code Guide
```
yao/
├── engine/
│ └── process.go # ⭐ Process engine — core concept in Yao
├── agent/ # ⭐ Agent framework — autonomous agent definitions
│ ├── agent.go # Agent lifecycle, trigger modes, execution phases
│ └── triggers/ # Clock, Human, Event trigger implementations
├── runtime/
│ └── v8/ # ⭐ Built-in V8 JavaScript/TypeScript engine
├── rag/
│ └── graph/ # ⭐ Built-in GraphRAG implementation
├── mcp/ # MCP integration
├── api/ # HTTP server and REST API
├── model/ # ORM and database layer
└── cmd/
└── yao/
└── main.go # Application entry point
```
> **💡 Tip:** Yao's DSL-based approach means you'll be reading `.yao` files (YAML-like definitions) as much as Go source code. The mental model is: you define agents as data (DSL), and the engine executes them.
---
### Day 1 (Monday): Architecture Deep Dive
**Read:**
- [ ] Full [README](https://github.com/YaoApp/yao)
- [ ] [Why Yao?](https://yaoapps.com/docs/documentation/en-us/getting-started/why-yao)
- [ ] [Documentation overview](https://yaoapps.com/docs)
- [ ] Skim the Go source: `cmd/yao/main.go` → `engine/process.go` → `agent/agent.go`
**Understand Yao's radical differences:**
| Traditional Agent | Yao Agent |
|-------------------|-----------|
| Entry point: chatbox | Entry point: email, events, schedules |
| Passive: you ask, it answers | Proactive: it works autonomously |
| Role: tool | Role: team member |
**The six-phase execution model:**
```
Inspiration → Goals → Tasks → Run → Deliver → Learn
```
**Three trigger modes:**
1. **Clock** — scheduled tasks (cron-like)
2. **Human** — triggered by email or messages
3. **Event** — triggered by webhooks or database changes
- [ ] **📝 Homework:** Write architecture summary at `~/agent-study/notes/week5-architecture.md`
- Focus on: How the event-driven model is fundamentally different from request-response
- Compare: 6-phase execution vs Pydantic-AI's run loop vs MS Agent Framework's graph
---
### Day 2 (Tuesday): Hello World + Core Concepts
**Setup:**
```bash
# Install Yao (single binary!)
curl -fsSL https://yaoapps.com/install.sh | bash
# Create a project
cd ~/agent-study/week5-yao
mkdir project && cd project
yao start # First run creates project structure
# Visit http://127.0.0.1:5099
```
**Run your first process:**
```bash
yao run utils.app.Ping # Returns version
yao run scripts.tests.Hello 'Hello, Yao!' # Run TypeScript
yao run models.tests.pet.Find 1 '::{}' # Query database
```
**Understand core concepts:**
- [ ] **Processes** — functions that can be run directly or referenced in code
- [ ] **Models** — database models defined in `.mod.yao` files
- [ ] **Scripts** — TypeScript/JavaScript code executed by the built-in V8 engine
- [ ] **DSL** — Yao's declarative syntax for defining everything
- [ ] **📝 Homework:** Build the simplest Yao application from scratch:
- Define a model, write a process, create a simple API endpoint
- Save project at `~/agent-study/week5-yao/hello_project/`
---
### Day 3 (Wednesday): Intermediate Build — Event-Driven Agents
**Focus: What makes Yao unique — event-driven, proactive agents**
**Work through:**
- [ ] Agent configuration — defining agents with roles and triggers
- [ ] Setting up a scheduled (Clock) trigger
- [ ] Setting up an Event trigger (webhook → agent action)
- [ ] MCP integration — connecting external tools
- [ ] GraphRAG — how the built-in knowledge graph works
**Key concepts:**
- How agents are defined declaratively (vs. programmatically in Python frameworks)
- How the three trigger modes work in practice
- How agents learn from past executions (the "Learn" phase)
- How GraphRAG combines vector search with graph traversal
- Why a single binary matters for deployment
- [ ] **📝 Homework:** Build an event-driven agent:
- **Must include:** At least 2 different trigger modes (e.g., Clock + Event)
- **Must include:** An agent that does something proactively (not just responding to a chat)
- Example idea: An agent that checks an RSS feed on a schedule (Clock), processes new articles (Run), and stores summaries in the knowledge base (Learn/Deliver)
- Save at `~/agent-study/week5-yao/event_agent/`
---
### Day 4 (Thursday): Advanced Patterns + Source Code Reading
**Read these source files (in order):**
1. `cmd/yao/main.go` — Application entry point, how the single binary initializes
2. `engine/process.go` — The process engine (core execution abstraction)
3. `agent/agent.go` — Agent lifecycle and execution phases
4. `runtime/v8/` — How the V8 engine is embedded for TypeScript support
5. `rag/graph/` — GraphRAG implementation (vector + graph hybrid search)
**Understand:**
- [ ] How Go's concurrency model (goroutines) enables event-driven agents
- [ ] How the V8 engine is embedded and used for TypeScript execution
- [ ] How GraphRAG combines embedding search with entity-relationship traversal
- [ ] How a single Go binary includes all these features without external dependencies
- [ ] **📝 Homework:** Write "What I'd Steal from Yao" at `~/agent-study/notes/week5-steal.md`
- Focus on: Event-driven architecture, single binary deployment, GraphRAG, DSL approach
- Think about: Could you add event-driven capabilities to a Python agent framework?
---
### Day 5 (Friday): Integration Project + Reflection
- [ ] **Build a mini-project:**
- **Suggested:** A "daily briefing" agent — schedule it to run every morning, have it gather data from APIs (weather, calendar, news), process it, and output a structured briefing. Use the Clock trigger + MCP for external data.
- **Alternative:** Build a webhook-triggered agent that processes incoming data and stores it in GraphRAG
- Save at `~/agent-study/week5-yao/integration_project/`
- [ ] **Write retrospective** at `~/agent-study/notes/week5-retro.md`
- [ ] **Update comparison matrix**
### 🎯 Key Questions:
1. How does Yao's event-driven model differ from the request-response model of every other framework?
2. What are the three trigger modes and when would you use each?
3. What is the six-phase execution model and how does the "Learn" phase create a feedback loop?
4. Why is single-binary deployment a significant advantage? Where would you deploy Yao that you couldn't deploy Python frameworks?
5. How does Yao's built-in GraphRAG differ from vector-only RAG?
6. What does it mean that Yao embeds a V8 engine? What are the implications for extensibility?
7. What types of applications is Yao best suited for vs. worst suited for?
---
## Week 6: MetaGPT
> **Difficulty:** ⭐⭐⭐ (Large codebase, academic concepts, multi-agent complexity)
> **Repo:** [github.com/FoundationAgents/MetaGPT](https://github.com/FoundationAgents/MetaGPT)
> **Stars:** 63k | **Language:** Python | **Papers:** ICLR 2024 + many more
### Why This Is Week 6
MetaGPT is the OG multi-agent framework and the capstone of your study. It introduces Standard Operating Procedures (SOPs) as the coordination mechanism — a genuinely novel idea that maps human organizational structures onto AI agents. By Week 6, you have enough context from the previous 5 frameworks to deeply appreciate what MetaGPT does differently.
### Resources
| Resource | Link |
|----------|------|
| 📖 Documentation | [docs.deepwisdom.ai](https://docs.deepwisdom.ai/main/en/) |
| 💬 Discord | [Discord](https://discord.gg/ZRHeExS6xv) |
| 📦 PyPI | [metagpt](https://pypi.org/project/metagpt/) |
| 🎯 MGX (commercial product) | [mgx.dev](https://mgx.dev/) |
| 📄 MetaGPT Paper (ICLR 2024) | [openreview.net](https://openreview.net/forum?id=VtmBAGCN7o) |
| 📄 AFlow Paper (ICLR 2025 Oral) | [openreview.net](https://openreview.net/forum?id=z5uVAKwmjf) |
| 📝 Agent 101 Tutorial | [Agent 101](https://docs.deepwisdom.ai/main/en/guide/tutorials/agent_101.html) |
| 📝 MultiAgent 101 | [MultiAgent 101](https://docs.deepwisdom.ai/main/en/guide/tutorials/multi_agent_101.html) |
| 🤗 HuggingFace Demo | [MetaGPT Space](https://huggingface.co/spaces/deepwisdom/MetaGPT-SoftwareCompany) |
### 🗂 Source Code Guide
```
metagpt/
├── roles/ # ⭐ Role definitions — each role = one agent with a job
│ ├── role.py # ⭐ Base Role class — THE core abstraction
│ ├── architect.py # Software architect agent
│ ├── engineer.py # Software engineer agent
│ ├── product_manager.py # Product manager agent
│ ├── project_manager.py # Project manager agent
│ └── di/
│ └── data_interpreter.py # Data analysis agent
├── actions/ # ⭐ Action definitions — what roles can do
│ ├── action.py # Base Action class
│ ├── write_prd.py # Write Product Requirements Document
│ ├── write_design.py # Write system design
│ └── write_code.py # Write code
├── team.py # ⭐ Team orchestration — how roles collaborate via SOPs
├── environment.py # ⭐ Shared environment — message passing between roles
├── schema.py # Message schemas for inter-role communication
├── config2.py # Configuration management
├── base/ # Base classes and utilities
├── memory/ # Memory management for roles
├── software_company.py # ⭐ The "software company" end-to-end pipeline
└── utils/
└── project_repo.py # Project repository management
```
> **💡 Tip:** The mental model is: **Role** (who) performs **Actions** (what) according to **SOPs** (how). Read `roles/role.py` first, then `actions/action.py`, then `team.py`. That's the holy trinity of MetaGPT.
---
### Day 1 (Monday): Architecture Deep Dive
**Read:**
- [ ] Full [README](https://github.com/FoundationAgents/MetaGPT)
- [ ] [Agent 101 Tutorial](https://docs.deepwisdom.ai/main/en/guide/tutorials/agent_101.html)
- [ ] [MultiAgent 101 Tutorial](https://docs.deepwisdom.ai/main/en/guide/tutorials/multi_agent_101.html)
- [ ] MetaGPT paper (abstract + Sections 1-3) — the SOP concept
- [ ] Skim the [AFlow paper](https://openreview.net/forum?id=z5uVAKwmjf) abstract — automated workflow generation
**Core philosophy:** `Code = SOP(Team)`
**Identify core abstractions:**
- **Role** — an agent with a specific job (PM, architect, engineer, etc.)
- **Action** — a discrete task a role can perform (write PRD, write code, etc.)
- **SOP** — Standard Operating Procedures that define the workflow between roles
- **Team** — the orchestrator that manages roles and message passing
- **Environment** — shared context where roles publish and subscribe to messages
- **Message** — typed communication between roles
**The "software company" pipeline:**
```
User Requirement
→ Product Manager (writes PRD)
→ Architect (writes system design)
→ Project Manager (creates task breakdown)
→ Engineer (writes code)
→ QA (tests code)
```
- [ ] **📝 Homework:** Write architecture summary at `~/agent-study/notes/week6-architecture.md`
- Explain the SOP model and how it maps to human organizations
- Compare: SOP coordination vs Graph workflows (MS) vs Event-driven (Yao) vs Linear (Pydantic-AI)
---
### Day 2 (Tuesday): Hello World + Core Concepts
**Setup:**
```bash
cd ~/agent-study/week6-metagpt
conda create -n metagpt python=3.11 && conda activate metagpt
pip install --upgrade metagpt
metagpt --init-config # Creates ~/.metagpt/config2.yaml
# Edit the config to add your API key
```
**Run the classic demo:**
```bash
metagpt "Create a snake game" # This will generate a full project in ./workspace
```
**Also try programmatically:**
```python
from metagpt.software_company import generate_repo
from metagpt.utils.project_repo import ProjectRepo
repo: ProjectRepo = generate_repo("Create a simple calculator app")
print(repo)
```
**And try the Data Interpreter:**
```python
import asyncio
from metagpt.roles.di.data_interpreter import DataInterpreter
async def main():
di = DataInterpreter()
await di.run("Run data analysis on sklearn Iris dataset, include a plot")
asyncio.run(main())
```
- [ ] **📝 Homework:** Build a custom role from scratch — NO copy-paste:
- Define a new `Role` subclass with custom `Action`s
- Example: a "ResearchAnalyst" role that takes a topic and produces a structured analysis
- Save at `~/agent-study/week6-metagpt/hello_role.py`
---
### Day 3 (Wednesday): Intermediate Build — Multi-Agent SOPs
**Focus: MetaGPT's unique capability — SOP-based multi-agent coordination**
**Work through:**
- [ ] [MultiAgent 101](https://docs.deepwisdom.ai/main/en/guide/tutorials/multi_agent_101.html)
- [ ] Look at the [Debate example](https://docs.deepwisdom.ai/main/en/guide/use_cases/multi_agent/debate.html)
- [ ] Understand how messages flow between roles via the Environment
- [ ] Understand how the SOP defines which role acts after which
**Key concepts:**
- How roles subscribe to message types from other roles
- How the Team orchestrator manages turn-taking
- How the Environment enables publish/subscribe communication
- How SOPs encode workflow logic without explicit graph definitions
- The difference between the "software company" SOP and custom SOPs
- [ ] **📝 Homework:** Build a multi-agent system with a custom SOP:
- **Must include:** At least 3 custom roles with different responsibilities
- **Must include:** Custom message types between roles
- **Must include:** A clear SOP workflow (Role A → Role B → Role C)
- Example idea: A "content creation team" — Researcher (gathers info) → Writer (drafts article) → Editor (reviews and improves) → Publisher (formats final output)
- Save at `~/agent-study/week6-metagpt/multi_agent_sop.py`
---
### Day 4 (Thursday): Advanced Patterns + Source Code Reading
**Read these source files (in order):**
1. `metagpt/roles/role.py` — Base Role class, how roles think and act
2. `metagpt/actions/action.py` — Base Action class, how actions execute
3. `metagpt/team.py` — Team orchestration, turn management
4. `metagpt/environment.py` — Message passing, pub/sub system
5. `metagpt/schema.py` — Message types and schemas
**Also explore:**
- [ ] `metagpt/roles/engineer.py` — how the Engineer role writes code (complex action chain)
- [ ] `metagpt/software_company.py` — the end-to-end pipeline
- [ ] `metagpt/memory/` — how roles maintain memory across turns
- [ ] `examples/` — AFlow and SPO implementations
**Advanced concepts:**
- [ ] How does AFlow (Automated Agentic Workflow Generation) work?
- [ ] What is SPO (Self-Play Optimization)?
- [ ] How does the Data Interpreter differ from the Software Company pipeline?
- [ ] **📝 Homework:** Write "What I'd Steal from MetaGPT" at `~/agent-study/notes/week6-steal.md`
- Focus on: SOP-based coordination, Role/Action abstraction, message-passing environment
- Reflect on: Which coordination model do you prefer? Graph (MS) vs SOP (MetaGPT) vs Event (Yao)?
---
### Day 5 (Friday): Integration Project + Final Reflection
- [ ] **Build a mini-project:**
- **Suggested:** A multi-agent system that takes a business idea and produces a full analysis: Market Researcher role → Business Analyst role → Financial Modeler role → Report Writer role. Each produces a structured output that feeds into the next.
- Save at `~/agent-study/week6-metagpt/integration_project/`
- [ ] **Write final retrospective** at `~/agent-study/notes/week6-retro.md`
- This one should be more comprehensive — reflect on ALL 6 weeks
- What framework would you reach for first? When?
- What surprised you most across the study?
- [ ] **Complete comparison matrix** — all 6 frameworks
- [ ] **Commit and push everything** to your study git repo
### 🎯 Key Questions:
1. What does "Code = SOP(Team)" mean concretely?
2. How does the Role/Action/SOP model map to real organizational structures?
3. How do messages flow between roles? What's the pub/sub mechanism?
4. What's the difference between MetaGPT's approach and MS Agent Framework's graph workflows?
5. How does the Data Interpreter feature differ from the Software Company pipeline?
6. What is AFlow and why was it accepted as an oral presentation at ICLR 2025?
7. When would you use MetaGPT vs simpler single-agent frameworks?
8. Across all 6 frameworks, which coordination model (linear/graph/SOP/event) do you think is most general?
---
## Week 7: ElizaOS
> **Timeline:** 1 week | **Difficulty:** ⭐⭐ | **Goal:** Learn agent deployment & multi-platform distribution
> **Repo:** [elizaOS/eliza](https://github.com/elizaOS/eliza) | ⭐ 17,476 | TypeScript
> **Why this week:** Weeks 1-6 taught you how to BUILD agents. This week teaches you how to DEPLOY them where users actually are.
### Why ElizaOS Makes The Cut
After a thorough debate (see the [deep dive analysis](./trending-repos-deep-dive.md)), ElizaOS earned its spot because:
- It's the **only deployment-focused platform** on the trending list — multi-platform routing (Discord, Telegram, Twitter, Farcaster) in one framework
- **17k stars** with active development and a large community
- The plugin architecture, character system, and platform adapters teach **real deployment patterns** you won't learn from any other framework studied
- Knowing how to ship agents to where users live is as important as knowing how to build them
### Resources
| Resource | URL |
|----------|-----|
| **GitHub** | https://github.com/elizaOS/eliza |
| **Docs** | https://elizaos.github.io/eliza/ |
| **Discord** | https://discord.gg/elizaos |
| **Quickstart** | https://elizaos.github.io/eliza/docs/quickstart |
### Key Source Files to Read
| File | Why It Matters |
|------|---------------|
| `packages/core/src/runtime.ts` | The AgentRuntime — the central brain that coordinates everything |
| `packages/core/src/types.ts` | All the core interfaces (Character, Memory, Action, Provider, Evaluator) |
| `packages/plugin-discord/src/index.ts` | How a platform adapter is built — the Discord integration |
| `packages/plugin-telegram/src/index.ts` | Compare with Discord adapter — spot the platform abstraction pattern |
| `packages/core/src/memory.ts` | Memory management — how agents maintain context across platforms |
| `agent/src/index.ts` | The entry point — how everything gets wired together |
---
### Day 1 (Monday): Architecture Deep Dive — The Deployment Platform
**Study (1-2 hrs):**
- Read the full README and quickstart docs
- Understand the core architecture:
- **Character files** — how agent personalities are defined (JSON-based)
- **AgentRuntime** — the central coordinator
- **Plugins** — how platform adapters, actions, and providers are registered
- **Actions vs Evaluators vs Providers** — the three extension points
- **Memory** — how conversation state persists across platforms
- Study the plugin system architecture — how does one agent connect to Discord AND Telegram simultaneously?
- Understand the character file format — what can you configure?
**Key Questions:**
- How does ElizaOS route a message from Discord to the right agent and back?
- What's the difference between an Action, an Evaluator, and a Provider?
- How does the memory system work across platforms? Can an agent remember a Discord convo when talking on Telegram?
- How does the character file influence agent behavior vs hard-coded logic?
**Homework:**
- [ ] Write a 1-page architecture summary covering: runtime → plugins → adapters → memory → character system
- [ ] Draw a diagram showing message flow: User sends Discord message → ... → Agent responds
- [ ] Compare the architecture to Pydantic-AI's approach — what's different about a "deployment-first" vs "logic-first" framework?
---
### Day 2 (Tuesday): Hello World — Deploy an Agent to Discord
**Study (1-2 hrs):**
- Set up the ElizaOS development environment
- Clone the repo, install deps (`pnpm install`)
- Create a Discord bot in the Discord Developer Portal (you'll need a test server)
- Set up your `.env` with Discord bot token and an LLM API key
- Create a custom character file for your agent:
- Define name, bio, personality traits, example conversations
- Set the model provider and platform connections
- Run the agent locally, verify it responds in Discord
**Homework:**
- [ ] Create a character file from scratch (no copy-paste from examples) — give it a distinct personality
- [ ] Deploy the agent to your Discord test server and have a 10-message conversation with it
- [ ] Screenshot the conversation and note: What worked? What felt off? How does character configuration affect responses?
---
### Day 3 (Wednesday): Multi-Platform + Plugin System
**Study (1-2 hrs):**
- Add a second platform — connect the same agent to Telegram (or Twitter)
- Same character, same agent, two platforms simultaneously
- Observe: does memory carry across? How does the agent handle platform-specific features?
- Study the plugin architecture:
- Read how `plugin-discord` and `plugin-telegram` are structured
- Understand the `Plugin` interface — what does a plugin provide?
- Look at how Actions work — these are the agent's "tools"
- Write a custom Action plugin:
- Something simple: a weather lookup, a file reader, or a joke generator
- Register it and verify your agent can use it on both platforms
**Homework:**
- [ ] Run your agent on 2 platforms simultaneously — screenshot both conversations
- [ ] Build a custom Action plugin from scratch and verify it works
- [ ] Write a comparison: how does ElizaOS's plugin system compare to Pydantic-AI's tool system and MetaGPT's action system? What are the trade-offs?
---
### Day 4 (Thursday): Source Code Reading + Advanced Patterns
**Study (1-2 hrs):**
- Read the key source files from the table above, focusing on:
- **runtime.ts** — How does the AgentRuntime process an incoming message? What's the evaluation pipeline?
- **types.ts** — What are all the interfaces? How extensible is the system?
- **memory.ts** — How is conversation history stored and retrieved? What's the embedding strategy?
- Study advanced patterns:
- Multi-agent setups — can you run multiple agents with different characters?
- Custom evaluators — how do you add post-processing logic?
- Custom providers — how do you inject context into every agent response?
- Compare deployment architecture decisions:
- How does ElizaOS handle rate limiting across platforms?
- How does it handle platform-specific message formatting (embeds, buttons, etc.)?
- What's the error handling strategy when a platform adapter fails?
**Homework:**
- [ ] Write a "What I'd Steal From ElizaOS" doc — which patterns are worth using in your own projects? Think:
- Character file abstraction for agent personality
- Plugin registration pattern
- Platform adapter interface
- Memory routing across services
- [ ] Identify the 3 biggest architectural weaknesses (every framework has them)
---
### Day 5 (Friday): Integration Project — Deploy a Week 1-6 Agent
**The real test:** Take an agent you built in Weeks 1-6 and deploy it to at least one chat platform using patterns learned from ElizaOS.
**Options (pick one):**
1. **Pydantic-AI agent → Discord:** Take your structured-output agent from Week 1 and wrap it in a Discord bot using ElizaOS's adapter patterns (or build your own minimal adapter inspired by their architecture)
2. **GPT Researcher → Telegram:** Take your research agent from Week 4 and make it accessible via Telegram — users send a topic, agent researches and responds
3. **Multi-framework pipeline → Discord:** Take your Week 6 MetaGPT multi-agent setup and expose it through a Discord interface where users can kick off the SOP workflow
**Homework:**
- [ ] Deploy a previously-built agent to a real chat platform — it must respond to real messages
- [ ] Write a retrospective for ElizaOS:
- **Strengths:** What does it do better than building your own deployment layer?
- **Weaknesses:** Where is it limited or frustrating?
- **When to use:** What type of project benefits most from ElizaOS?
- **When to skip:** When is it overkill or the wrong tool?
- [ ] Update the comparison matrix with the ElizaOS column
- [ ] Answer: "If I were building a production agent for a client, would I use ElizaOS for deployment or roll my own? Why?"
### Key Questions You Should Be Able to Answer After Week 7
1. How does ElizaOS's character system differ from hardcoding agent personalities?
2. What's the plugin registration lifecycle — from `Plugin` definition to runtime availability?
3. How would you add a completely new platform (e.g., Slack, WhatsApp) to ElizaOS?
4. What are the trade-offs of a deployment-platform approach vs building bespoke platform integrations?
5. How does multi-platform memory work — and where does it break down?
6. When is ElizaOS the right choice vs a simple Discord.js bot?
7. What deployment patterns from ElizaOS would you steal for a custom agent pipeline?
---
## Week 8: Capstone Project
> **Timeline:** 1 week | **Difficulty:** ⭐⭐⭐⭐⭐ | **Goal:** Synthesize learnings from 3+ frameworks
### The Project: "Research → Analyze → Act" Pipeline
Build a system that combines at least 3 of the frameworks you studied:
#### Recommended Architecture
```
┌─────────────────────────────────────────────────────────┐
│ Capstone Pipeline │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌────────────┐ │
│ │ GPT │ │ Pydantic-AI │ │ MetaGPT OR │ │
│ │ Researcher │───▶│ Structured │───▶│ MS Agent │ │
│ │ (Research) │ │ Analysis │ │ Framework │ │
│ │ │ │ Agent │ │ (Execute) │ │
│ └──────────────┘ └──────────────┘ └────────────┘ │
│ │
│ Optional additions: │
│ - Agent-S for browser automation during research │
│ - Yao for scheduling periodic re-research │
└─────────────────────────────────────────────────────────┘
```
#### Requirements
- [ ] **Stage 1: Research** — Use GPT Researcher to conduct deep research on a topic
- [ ] **Stage 2: Analysis** — Use Pydantic-AI to process research into structured data with validated output types
- [ ] **Stage 3: Action** — Use MetaGPT's multi-agent SOP OR MS Agent Framework's graph workflow to generate deliverables from the structured analysis
- [ ] **Integration:** The output of one stage must be the input to the next
- [ ] **Documentation:** Write a README explaining your architecture and design decisions
#### Stretch Goals
- [ ] Add a Yao scheduled trigger so the pipeline runs daily/weekly
- [ ] Deploy the entire pipeline to Discord/Telegram using ElizaOS patterns from Week 7
- [ ] Add observability (Logfire or OpenTelemetry)
- [ ] Add a web UI (even simple HTML)
- [ ] Use MCP to connect components
- [ ] Add Agent-S for any browser automation steps
#### Deliverables
- [ ] Working code at `~/agent-study/capstone/`
- [ ] `README.md` with architecture diagram and setup instructions
- [ ] `DECISIONS.md` explaining why you chose each framework for each stage
- [ ] `RETROSPECTIVE.md` — final thoughts on the 7-week journey
#### Suggested Topics for the Pipeline
1. **Competitor Analysis Tool** — Research competitors → Structure findings → Generate strategic recommendations
2. **Daily News Briefing** — Research trending topics → Analyze relevance → Generate personalized newsletter
3. **Technical Due Diligence** — Research a technology → Structured pros/cons → Multi-perspective report (architect, PM, engineer roles)
4. **Market Research Report** — Research a market → Structured data extraction → Executive summary + detailed report
---
## Appendix: Comparison Matrix Template
Save this at `~/agent-study/comparison-matrix/matrix.md` and fill it in weekly:
```markdown
# AI Agent Framework Comparison Matrix
| Dimension | Pydantic-AI | MS Agent Framework | Agent-S | GPT Researcher | Yao | MetaGPT | ElizaOS |
|-----------|-------------|-------------------|---------|----------------|-----|---------|---------|
| **Language** | Python | Python + .NET | Python | Python | Go | Python | TypeScript |
| **Stars** | 14.6k | 7k | 9.6k | 25k | 7.5k | 63k | 17k |
| **Agent Definition** | | | | | | | |
| **Tool Integration** | | | | | | | |
| **Multi-Agent Coord.** | | | | | | | |
| **Error Handling** | | | | | | | |
| **Observability** | | | | | | | |
| **Type Safety** | | | | | | | |
| **DX / Ergonomics** | | | | | | | |
| **Production Readiness** | | | | | | | |
| **Unique Superpower** | | | | | | | |
| **Biggest Weakness** | | | | | | | |
| **Best Use Case** | | | | | | | |
| **Would I Use For...** | | | | | | | |
| **Overall Rating (1-10)** | | | | | | | |
```
---
## 📊 Week-by-Week Schedule Overview
| Week | Framework | Focus | Difficulty | Key Deliverables |
|------|-----------|-------|------------|------------------|
| 0 | Prep | Setup & background reading | ⭐ | Environment ready, papers skimmed |
| 1 | Pydantic-AI | Type-safe agents, DI, structured output | ⭐⭐ | Architecture doc, 3 agents, steal doc |
| 2 | MS Agent Framework | Graph workflows, DevUI, enterprise patterns | ⭐⭐⭐ | Graph workflow, DevUI screenshots, steal doc |
| 3 | Agent-S | Computer use, visual grounding, screenshots | ⭐⭐⭐⭐ | Computer use demo, architecture analysis |
| 4 | GPT Researcher | Deep research, Plan-and-Solve, RAG | ⭐⭐ | Research agent, MCP integration |
| 5 | Yao | Event-driven agents, Go, single binary, GraphRAG | ⭐⭐⭐⭐ | Event-driven agent, DSL exploration |
| 6 | MetaGPT | SOPs, multi-agent teams, roles/actions | ⭐⭐⭐ | Multi-agent SOP, comparison matrix |
| 7 | ElizaOS | Deployment, multi-platform distribution, plugins | ⭐⭐ | Multi-platform agent, custom plugin, deploy a Week 1-6 agent |
| 8 | Capstone | Integrate 3+ frameworks | ⭐⭐⭐⭐⭐ | Working pipeline, docs, retrospective |
---
## 🏁 Success Criteria
After completing this study plan, you should be able to:
1. **Explain** the architecture of each framework from memory (whiteboard test)
2. **Build** a production-grade agent with Pydantic-AI from scratch
3. **Design** a graph workflow for a complex multi-step process
4. **Understand** computer-use agent architecture and its limitations
5. **Implement** a Plan-and-Solve research pipeline
6. **Compare** event-driven vs request-response agent architectures
7. **Deploy** an agent to Discord/Telegram and understand multi-platform routing patterns
8. **Choose** the right framework for a given problem with clear reasoning
9. **Read** any agent framework's source code and quickly identify its core abstractions
> *"The goal isn't to memorize APIs. It's to build intuition for how agent systems are designed, so you can build your own or extend existing ones with confidence."*
---
*Generated by Clawdbot | February 4, 2026*