1498 lines
71 KiB
Markdown
1498 lines
71 KiB
Markdown
# 🧠 AI Agent Frameworks — 8-Week Deep Study Plan
|
|
|
|
> **Goal:** Go from "I've heard of these" to "I could build & deploy production systems with these" in 8 weeks.
|
|
> **Time commitment:** ~1-2 hours/day, Mon-Fri
|
|
> **Based on:** [Trending Repos Deep Dive Analysis](./trending-repos-deep-dive.md) (Feb 2026)
|
|
> **Last updated:** February 4, 2026
|
|
|
|
---
|
|
|
|
## 📋 Table of Contents
|
|
|
|
- [Week 0: Prep & Prerequisites](#week-0-prep--prerequisites)
|
|
- [Week 1: Pydantic-AI](#week-1-pydantic-ai) — The Production SDK ⭐⭐
|
|
- [Week 2: Microsoft Agent Framework](#week-2-microsoft-agent-framework) — Enterprise Orchestration ⭐⭐⭐
|
|
- [Week 3: Agent-S](#week-3-agent-s) — Computer Use Pioneer ⭐⭐⭐⭐
|
|
- [Week 4: GPT Researcher](#week-4-gpt-researcher) — Deep Research Agent ⭐⭐
|
|
- [Week 5: Yao](#week-5-yao) — Event-Driven Agents in Go ⭐⭐⭐⭐
|
|
- [Week 6: MetaGPT](#week-6-metagpt) — Multi-Agent SOP Framework ⭐⭐⭐
|
|
- [Week 7: ElizaOS](#week-7-elizaos) — Deployment & Multi-Platform Distribution ⭐⭐
|
|
- [Week 8: Capstone Project](#week-8-capstone-project)
|
|
- [Appendix: Comparison Matrix Template](#appendix-comparison-matrix-template)
|
|
|
|
> ⭐ = Difficulty Rating (1-5). More stars = harder week.
|
|
|
|
---
|
|
|
|
## Week 0: Prep & Prerequisites
|
|
|
|
> **Timeline:** The weekend before you start. ~3-4 hours total.
|
|
|
|
### Environment Setup
|
|
|
|
- [ ] **Python 3.11+** installed (`python --version`)
|
|
- [ ] **Go 1.21+** installed for Week 5 (`go version`)
|
|
- [ ] **Node.js 18+** and `pnpm` installed (needed for MetaGPT and Yao)
|
|
- [ ] **Docker Desktop** installed and running
|
|
- [ ] **Git** configured with SSH keys for cloning repos
|
|
- [ ] **VS Code** (or your editor) with Python + Go extensions
|
|
- [ ] **A GPU or cloud GPU access** (optional, helps for Agent-S grounding model)
|
|
|
|
### API Keys & Accounts
|
|
|
|
- [ ] **OpenAI API key** — used by almost every framework
|
|
- [ ] **Anthropic API key** — primary for Pydantic-AI examples
|
|
- [ ] **Tavily API key** — required for GPT Researcher (free tier works: [app.tavily.com](https://app.tavily.com))
|
|
- [ ] **Azure OpenAI access** — needed for Microsoft Agent Framework (free trial available)
|
|
- [ ] **Hugging Face account + token** — needed for Agent-S grounding model
|
|
- [ ] **Google API key** — optional, for Gemini-based features in GPT Researcher
|
|
|
|
### Workspace Setup
|
|
|
|
```bash
|
|
# Create a clean workspace for all 6 weeks
|
|
mkdir -p ~/agent-study/{week1-pydantic-ai,week2-ms-agent,week3-agent-s,week4-gpt-researcher,week5-yao,week6-metagpt,capstone}
|
|
mkdir -p ~/agent-study/notes
|
|
mkdir -p ~/agent-study/comparison-matrix
|
|
|
|
# Initialize a git repo for your study notes
|
|
cd ~/agent-study
|
|
git init
|
|
echo "# AI Agent Frameworks Study" > README.md
|
|
git add . && git commit -m "init study workspace"
|
|
```
|
|
|
|
### Background Reading (1-2 hours)
|
|
|
|
Read these before Week 1. They're the conceptual foundation:
|
|
|
|
- [ ] **[Plan-and-Solve Prompting](https://arxiv.org/abs/2305.04091)** — The paper behind GPT Researcher's architecture. Skim the abstract + Section 3.
|
|
- [ ] **[RAG paper](https://arxiv.org/abs/2005.11401)** — Core concept used by multiple frameworks. Read abstract + intro.
|
|
- [ ] **[Model Context Protocol (MCP) spec](https://modelcontextprotocol.io/)** — Anthropic's protocol for tool integration. Read the overview page.
|
|
- [ ] **[Agent2Agent (A2A) protocol](https://google.github.io/A2A/)** — Google's agent interop standard. Skim the spec overview.
|
|
- [ ] **[Pydantic docs (crash course)](https://docs.pydantic.dev/latest/concepts/models/)** — If you're rusty on Pydantic, spend 30 min here. It's the foundation of Week 1.
|
|
|
|
### Mental Model to Build
|
|
|
|
Every agent framework answers the same 5 questions differently:
|
|
|
|
1. **How do you define an agent?** (class, function, config, DSL)
|
|
2. **How do agents use tools?** (function calling, MCP, code execution)
|
|
3. **How do multiple agents coordinate?** (graph, SOP, message passing, events)
|
|
4. **How do you handle errors & retries?** (automatic, manual, durable execution)
|
|
5. **How do you observe what happened?** (logging, tracing, replay)
|
|
|
|
Keep these questions in mind every week. By Week 6, you'll have 6 different answers for each.
|
|
|
|
---
|
|
|
|
## Week 1: Pydantic-AI
|
|
|
|
> **Difficulty:** ⭐⭐ (Approachable — excellent docs, familiar Python patterns)
|
|
> **Repo:** [github.com/pydantic/pydantic-ai](https://github.com/pydantic/pydantic-ai)
|
|
> **Stars:** 14.6k | **Language:** Python | **Version:** v1.52.0+
|
|
|
|
### Why This Is Week 1
|
|
|
|
Pydantic-AI is the most ergonomic agent framework and has the best docs. Starting here builds your mental model for how agent SDKs *should* feel. Everything after this week will be compared to Pydantic-AI's developer experience. It's the FastAPI of agents — you'll understand why once you use it.
|
|
|
|
### Resources
|
|
|
|
| Resource | Link |
|
|
|----------|------|
|
|
| 📖 Documentation | [ai.pydantic.dev](https://ai.pydantic.dev/) |
|
|
| 💬 Community (Slack) | [Pydantic Slack](https://logfire.pydantic.dev/docs/join-slack/) |
|
|
| 📦 PyPI | [pydantic-ai](https://pypi.org/project/pydantic-ai/) |
|
|
| 🔭 Observability | [Pydantic Logfire](https://pydantic.dev/logfire) |
|
|
| 📝 Blog: How it was built | [Pydantic blog](https://pydantic.dev/articles) |
|
|
| 🎥 Intro video | Search "Pydantic AI tutorial 2025" on YouTube |
|
|
|
|
### 🗂 Source Code Guide — "Read THESE Files"
|
|
|
|
```
|
|
pydantic_ai_slim/pydantic_ai/
|
|
├── agent/
|
|
│ └── __init__.py # ⭐ THE file. Agent class definition, run(), run_sync(), run_stream()
|
|
├── _agent_graph.py # ⭐ Internal agent execution graph — how runs actually execute
|
|
├── tools.py # ⭐ Tool decorator, RunContext, tool schema generation
|
|
├── result.py # ⭐ RunResult, StreamedRunResult — output handling
|
|
├── models/
|
|
│ ├── __init__.py # Model ABC — how all model providers implement the same interface
|
|
│ ├── openai.py # OpenAI provider implementation
|
|
│ └── anthropic.py # Anthropic provider implementation
|
|
├── _a2a.py # Agent2Agent protocol integration
|
|
├── mcp.py # MCP client/server integration
|
|
└── _output.py # Output type handling, Pydantic validation on LLM outputs
|
|
```
|
|
|
|
> **💡 Tip:** Start with `agent/__init__.py`. It's beautifully documented with docstrings. Then read `tools.py` to understand how the `@agent.tool` decorator works. Finally, read `_agent_graph.py` to see how the runtime orchestrates tool calls.
|
|
|
|
---
|
|
|
|
### Day 1 (Monday): Architecture Deep Dive
|
|
|
|
**Read:**
|
|
- [ ] The full [README](https://github.com/pydantic/pydantic-ai)
|
|
- [ ] Docs: [Introduction](https://ai.pydantic.dev/)
|
|
- [ ] Docs: [Agents](https://ai.pydantic.dev/agents)
|
|
- [ ] Docs: [Models Overview](https://ai.pydantic.dev/models/overview)
|
|
- [ ] Docs: [Tools](https://ai.pydantic.dev/tools)
|
|
- [ ] Docs: [Output / Structured Results](https://ai.pydantic.dev/output)
|
|
- [ ] Docs: [Dependency Injection](https://ai.pydantic.dev/dependencies) (if exists) or see DI pattern in the bank support example
|
|
|
|
**Identify core abstractions:**
|
|
- `Agent` — the central class (generic over deps + output type)
|
|
- `RunContext` — carries dependencies into tool functions
|
|
- `Tool` — decorated functions the LLM can call
|
|
- `ModelSettings` — per-request model configuration
|
|
- `RunResult` / `StreamedRunResult` — typed output containers
|
|
|
|
**Understand the execution flow:**
|
|
```
|
|
User prompt → Agent.run() → Model call → [Tool call → Tool execution → Model call]* → Validated output
|
|
```
|
|
|
|
- [ ] **📝 Homework:** Write a 1-page architecture summary at `~/agent-study/notes/week1-architecture.md`
|
|
- Cover: Agent lifecycle, dependency injection pattern, how tools are registered and called, how output validation works
|
|
- Draw a simple diagram (ASCII or hand-drawn photo is fine)
|
|
|
|
---
|
|
|
|
### Day 2 (Tuesday): Hello World + Core Concepts
|
|
|
|
**Setup:**
|
|
```bash
|
|
cd ~/agent-study/week1-pydantic-ai
|
|
python -m venv .venv && source .venv/bin/activate
|
|
pip install pydantic-ai
|
|
```
|
|
|
|
**Run the quickstart:**
|
|
```python
|
|
from pydantic_ai import Agent
|
|
|
|
agent = Agent(
|
|
'anthropic:claude-sonnet-4-0',
|
|
instructions='Be concise, reply with one sentence.',
|
|
)
|
|
|
|
result = agent.run_sync('Where does "hello world" come from?')
|
|
print(result.output)
|
|
```
|
|
|
|
**Understand the core API surface:**
|
|
- [ ] `agent.run()` vs `agent.run_sync()` vs `agent.run_stream()`
|
|
- [ ] How `instructions` work (static string vs dynamic function)
|
|
- [ ] How model selection works (string shorthand vs model objects)
|
|
- [ ] How `result.output` is typed
|
|
|
|
- [ ] **📝 Homework:** Build the simplest agent from scratch — NO copy-paste
|
|
- Requirements: takes a topic, returns a structured output (use a Pydantic model as the output type)
|
|
- Must use at least one custom instruction
|
|
- Save at `~/agent-study/week1-pydantic-ai/hello_agent.py`
|
|
|
|
---
|
|
|
|
### Day 3 (Wednesday): Intermediate Build — Structured Output + DI
|
|
|
|
**Focus: Pydantic-AI's killer features — type-safe structured output and dependency injection**
|
|
|
|
**Work through:**
|
|
- [ ] The [bank support agent example](https://ai.pydantic.dev/#tools-dependency-injection-example) from the docs
|
|
- [ ] Docs: [Structured Output / Streamed Results](https://ai.pydantic.dev/output#streamed-results)
|
|
- [ ] Docs: [Graph Support](https://ai.pydantic.dev/graph)
|
|
|
|
**Key concepts to grok:**
|
|
- How `RunContext[DepsType]` carries typed dependencies
|
|
- How Pydantic models as output types create validated, structured responses
|
|
- How tool docstrings become the tool description sent to the LLM
|
|
- How streaming works with structured output (partial validation!)
|
|
|
|
- [ ] **📝 Homework:** Build an agent that uses the framework's unique capabilities:
|
|
- **Must include:** Dependency injection with a real dependency (database mock, API client, etc.)
|
|
- **Must include:** Structured output via a Pydantic model (not just string output)
|
|
- **Must include:** At least 2 tools
|
|
- Example idea: A "recipe finder" agent with deps for a recipe database, tools for searching and filtering, output as a structured `Recipe` model
|
|
- Save at `~/agent-study/week1-pydantic-ai/structured_agent.py`
|
|
|
|
---
|
|
|
|
### Day 4 (Thursday): Advanced Patterns + Source Code Reading
|
|
|
|
**Read these source files (in order):**
|
|
1. `pydantic_ai_slim/pydantic_ai/agent/__init__.py` — How `Agent` class is defined, the generic type parameters
|
|
2. `pydantic_ai_slim/pydantic_ai/tools.py` — How `@tool` works, schema generation, `RunContext`
|
|
3. `pydantic_ai_slim/pydantic_ai/_agent_graph.py` — The internal execution engine
|
|
4. `pydantic_ai_slim/pydantic_ai/result.py` — How results are wrapped, streamed, validated
|
|
5. `pydantic_ai_slim/pydantic_ai/models/__init__.py` — The model provider ABC
|
|
|
|
**Understand:**
|
|
- [ ] How errors from tool execution are passed back to the LLM for retry
|
|
- [ ] How streaming works internally (incremental Pydantic validation)
|
|
- [ ] How the `_agent_graph.py` orchestrates the conversation loop
|
|
- [ ] How durable execution checkpoints work
|
|
|
|
**Explore advanced features:**
|
|
- [ ] Docs: [Durable Execution](https://ai.pydantic.dev/durable_execution/overview/)
|
|
- [ ] Docs: [MCP Integration](https://ai.pydantic.dev/mcp/overview)
|
|
- [ ] Docs: [Human-in-the-Loop](https://ai.pydantic.dev/deferred-tools)
|
|
- [ ] Docs: [Evals](https://ai.pydantic.dev/evals)
|
|
|
|
- [ ] **📝 Homework:** Write "What I'd Steal from Pydantic-AI" at `~/agent-study/notes/week1-steal.md`
|
|
- Focus on: DI pattern, type-safe generics, streaming validation, tool retry pattern
|
|
- What design decisions are genius? What would you do differently?
|
|
|
|
---
|
|
|
|
### Day 5 (Friday): Integration Project + Reflection
|
|
|
|
- [ ] **Build a mini-project** that integrates with something real:
|
|
- **Suggested:** An agent that queries a real API (weather, GitHub, Hacker News), processes the data through tools, and returns a structured report as a Pydantic model
|
|
- **Bonus:** Add Logfire observability (it's free tier) and see the traces
|
|
- **Bonus:** Expose it as an MCP server
|
|
- Save at `~/agent-study/week1-pydantic-ai/integration_project/`
|
|
|
|
- [ ] **Write retrospective** at `~/agent-study/notes/week1-retro.md`:
|
|
- Strengths of Pydantic-AI
|
|
- Weaknesses / gaps you noticed
|
|
- When would you reach for this vs building from scratch?
|
|
- What surprised you?
|
|
|
|
- [ ] **Start comparison matrix** at `~/agent-study/comparison-matrix/matrix.md` (see [template](#appendix-comparison-matrix-template))
|
|
|
|
### 🎯 Key Questions — You Should Be Able to Answer:
|
|
|
|
1. What does the `Agent` class generic signature `Agent[DepsType, OutputType]` buy you?
|
|
2. How does dependency injection work in Pydantic-AI and why is it better than global state?
|
|
3. How does Pydantic-AI validate structured output from an LLM that returns free-form text?
|
|
4. What happens when a tool call fails? How does the retry loop work?
|
|
5. What's the difference between `run()`, `run_sync()`, and `run_stream()`?
|
|
6. How would you add a new model provider to Pydantic-AI?
|
|
7. What is durable execution and when would you use it?
|
|
|
|
---
|
|
|
|
## Week 2: Microsoft Agent Framework
|
|
|
|
> **Difficulty:** ⭐⭐⭐ (Larger surface area, graph concepts, mono-repo navigation)
|
|
> **Repo:** [github.com/microsoft/agent-framework](https://github.com/microsoft/agent-framework)
|
|
> **Stars:** 7k | **Languages:** Python + .NET | **Born from:** Semantic Kernel + AutoGen
|
|
|
|
### Why This Is Week 2
|
|
|
|
If Pydantic-AI is the developer's choice, Microsoft Agent Framework is the enterprise's choice. It introduces graph-based workflows — a fundamentally different orchestration model from the simple agent loop you learned in Week 1. Understanding this framework means understanding where corporate AI agent development is heading.
|
|
|
|
### Resources
|
|
|
|
| Resource | Link |
|
|
|----------|------|
|
|
| 📖 Documentation | [learn.microsoft.com/agent-framework](https://learn.microsoft.com/en-us/agent-framework/) |
|
|
| 🚀 Quick Start | [Quick Start Tutorial](https://learn.microsoft.com/agent-framework/tutorials/quick-start) |
|
|
| 💬 Discord | [Discord](https://discord.gg/b5zjErwbQM) |
|
|
| 🎥 Intro Video (30 min) | [YouTube](https://www.youtube.com/watch?v=AAgdMhftj8w) |
|
|
| 🎥 DevUI Demo (1 min) | [YouTube](https://www.youtube.com/watch?v=mOAaGY4WPvc) |
|
|
| 📦 PyPI | [agent-framework](https://pypi.org/project/agent-framework/) |
|
|
| 📝 Migration from SK | [Semantic Kernel Migration](https://learn.microsoft.com/en-us/agent-framework/migration-guide/from-semantic-kernel) |
|
|
| 📝 Migration from AutoGen | [AutoGen Migration](https://learn.microsoft.com/en-us/agent-framework/migration-guide/from-autogen) |
|
|
|
|
### 🗂 Source Code Guide
|
|
|
|
```
|
|
python/packages/
|
|
├── agent-framework/ # ⭐ Core package — agents, middleware, workflows
|
|
│ └── src/agent_framework/
|
|
│ ├── agents/ # Agent base classes and implementations
|
|
│ ├── workflows/ # ⭐ Graph-based workflow engine
|
|
│ └── middleware/ # ⭐ Request/response middleware pipeline
|
|
├── azure-ai/ # Azure AI provider (Responses API)
|
|
├── openai/ # OpenAI provider
|
|
├── anthropic/ # Anthropic provider
|
|
├── devui/ # ⭐ Developer UI for debugging workflows
|
|
├── mcp/ # MCP integration
|
|
├── a2a/ # Agent2Agent protocol
|
|
└── lab/ # Experimental features (benchmarking, RL)
|
|
|
|
python/samples/getting_started/
|
|
├── agents/ # ⭐ Start here — basic agent examples
|
|
├── workflows/ # ⭐ Graph workflow examples (critical!)
|
|
├── middleware/ # Middleware examples
|
|
└── observability/ # OpenTelemetry integration
|
|
```
|
|
|
|
> **💡 Tip:** This is a mono-repo. Don't try to read everything. Focus on `python/packages/agent-framework/` for the core, and `python/samples/getting_started/workflows/` for the graph workflow examples.
|
|
|
|
---
|
|
|
|
### Day 1 (Monday): Architecture Deep Dive
|
|
|
|
**Read:**
|
|
- [ ] [Overview](https://learn.microsoft.com/agent-framework/overview/agent-framework-overview)
|
|
- [ ] The full [README](https://github.com/microsoft/agent-framework)
|
|
- [ ] [User Guide Overview](https://learn.microsoft.com/en-us/agent-framework/user-guide/overview)
|
|
- [ ] Watch the [30-min intro video](https://www.youtube.com/watch?v=AAgdMhftj8w) (at 1.5x speed)
|
|
- [ ] Skim the [SK migration guide](https://learn.microsoft.com/en-us/agent-framework/migration-guide/from-semantic-kernel) to understand lineage
|
|
|
|
**Identify core abstractions:**
|
|
- `Agent` — base agent interface
|
|
- `Workflow` / `Graph` — the graph-based orchestration system
|
|
- `Middleware` — request/response processing pipeline
|
|
- `AgentProvider` — LLM provider abstraction
|
|
- `DevUI` — visual debugging tool
|
|
|
|
**Key architectural insight:** This framework uses a **data-flow graph** model where nodes are agents or functions, and edges carry data between them. This is fundamentally different from Pydantic-AI's linear agent loop.
|
|
|
|
- [ ] **📝 Homework:** Write a 1-page architecture summary at `~/agent-study/notes/week2-architecture.md`
|
|
- Compare the graph workflow model to Pydantic-AI's linear model
|
|
- Draw the graph workflow concept (nodes = agents/functions, edges = data flow)
|
|
|
|
---
|
|
|
|
### Day 2 (Tuesday): Hello World + Core Concepts
|
|
|
|
**Setup:**
|
|
```bash
|
|
cd ~/agent-study/week2-ms-agent
|
|
python -m venv .venv && source .venv/bin/activate
|
|
pip install agent-framework --pre
|
|
# You'll need Azure credentials or an OpenAI key
|
|
```
|
|
|
|
**Run the quickstart:**
|
|
```python
|
|
import asyncio
|
|
from agent_framework.openai import OpenAIChatClient
|
|
|
|
async def main():
|
|
agent = OpenAIChatClient(
|
|
api_key="your-key"
|
|
).as_agent(
|
|
name="HaikuBot",
|
|
instructions="You are an upbeat assistant that writes beautifully.",
|
|
)
|
|
print(await agent.run("Write a haiku about AI agents."))
|
|
|
|
asyncio.run(main())
|
|
```
|
|
|
|
**Understand:**
|
|
- [ ] `as_agent()` pattern — how providers become agents
|
|
- [ ] The difference between Chat agents and Responses agents
|
|
- [ ] How the Python API differs from the .NET API (skim a .NET example)
|
|
|
|
- [ ] **📝 Homework:** Build the simplest agent from scratch — NO copy-paste
|
|
- Save at `~/agent-study/week2-ms-agent/hello_agent.py`
|
|
|
|
---
|
|
|
|
### Day 3 (Wednesday): Intermediate Build — Graph Workflows
|
|
|
|
**This is the key differentiator. This is the day that matters.**
|
|
|
|
**Work through:**
|
|
- [ ] `python/samples/getting_started/workflows/` — all examples
|
|
- [ ] Docs: Workflow/Graph tutorials on learn.microsoft.com
|
|
- [ ] Understand streaming, checkpointing, and time-travel in graphs
|
|
|
|
**Key concepts:**
|
|
- How nodes in a graph can be agents OR deterministic functions
|
|
- How data flows between nodes via typed edges
|
|
- How checkpointing enables pause/resume of long-running workflows
|
|
- How human-in-the-loop fits into the graph model
|
|
- How time-travel lets you replay/debug workflows
|
|
|
|
- [ ] **📝 Homework:** Build a graph workflow:
|
|
- **Must include:** At least 3 nodes (mix of agent nodes and function nodes)
|
|
- **Must include:** Branching logic (conditional edges)
|
|
- Example idea: A "content pipeline" — Node 1 (agent: research a topic) → Node 2 (function: format research) → Node 3 (agent: write blog post) with a branch for "needs more research"
|
|
- Save at `~/agent-study/week2-ms-agent/graph_workflow.py`
|
|
|
|
---
|
|
|
|
### Day 4 (Thursday): Advanced Patterns + Source Code Reading
|
|
|
|
**Read these source files:**
|
|
1. Core agent base classes in `python/packages/agent-framework/`
|
|
2. Workflow/graph engine implementation
|
|
3. Middleware pipeline implementation
|
|
4. DevUI package structure
|
|
5. At least one provider implementation (OpenAI or Azure)
|
|
|
|
**Explore:**
|
|
- [ ] Set up and run the **DevUI** — visualize your graph workflow from Day 3
|
|
- [ ] Look at the **OpenTelemetry integration** — `python/samples/getting_started/observability/`
|
|
- [ ] Read the **middleware examples** — understand the request/response pipeline
|
|
- [ ] Check out the **lab package** — what's experimental?
|
|
|
|
- [ ] **📝 Homework:** Write "What I'd Steal from MS Agent Framework" at `~/agent-study/notes/week2-steal.md`
|
|
- Focus on: Graph workflow model, DevUI concept, middleware pipeline, multi-language support
|
|
- Compare to Pydantic-AI: when would you choose one over the other?
|
|
|
|
---
|
|
|
|
### Day 5 (Friday): Integration Project + Reflection
|
|
|
|
- [ ] **Build a mini-project:**
|
|
- **Suggested:** A multi-step data processing pipeline using graph workflows
|
|
- Must have: at least one agent node calling an LLM, at least one pure function node, checkpointing enabled
|
|
- **Bonus:** Get the DevUI running and screenshot your workflow visualization
|
|
- Save at `~/agent-study/week2-ms-agent/integration_project/`
|
|
|
|
- [ ] **Write retrospective** at `~/agent-study/notes/week2-retro.md`
|
|
- [ ] **Update comparison matrix** — add MS Agent Framework entry
|
|
|
|
### 🎯 Key Questions:
|
|
|
|
1. What's the difference between a linear agent loop and a graph-based workflow?
|
|
2. How does checkpointing work in MS Agent Framework workflows?
|
|
3. What does "time-travel" mean in the context of agent debugging?
|
|
4. How does the middleware pipeline work and when would you use it?
|
|
5. What's the DevUI and what can you debug with it that you can't with logs alone?
|
|
6. How does this framework's agent abstraction compare to Pydantic-AI's `Agent` class?
|
|
7. When would you choose MS Agent Framework over Pydantic-AI? (Think: team size, workflow complexity, language requirements)
|
|
|
|
---
|
|
|
|
## Week 3: Agent-S
|
|
|
|
> **Difficulty:** ⭐⭐⭐⭐ (Requires GPU for grounding model, novel paradigm, research-grade code)
|
|
> **Repo:** [github.com/simular-ai/Agent-S](https://github.com/simular-ai/Agent-S)
|
|
> **Stars:** 9.6k | **Language:** Python | **Papers:** ICLR 2025, COLM 2025
|
|
|
|
### Why This Is Week 3
|
|
|
|
This is a completely different paradigm. Weeks 1-2 were about agents that work with APIs and text. Agent-S works with **pixels and clicks** — it uses your computer like a human does. This is the frontier of agent development. Understanding Agent-S means understanding where computer-use agents are heading.
|
|
|
|
### Resources
|
|
|
|
| Resource | Link |
|
|
|----------|------|
|
|
| 📖 Repo | [github.com/simular-ai/Agent-S](https://github.com/simular-ai/Agent-S) |
|
|
| 💬 Discord | [Discord](https://discord.gg/E2XfsK9fPV) |
|
|
| 📄 S1 Paper (ICLR 2025) | [arxiv.org/abs/2410.08164](https://arxiv.org/abs/2410.08164) |
|
|
| 📄 S2 Paper (COLM 2025) | [arxiv.org/abs/2504.00906](https://arxiv.org/abs/2504.00906) |
|
|
| 📄 S3 Paper | [arxiv.org/abs/2510.02250](https://arxiv.org/abs/2510.02250) |
|
|
| 🌐 S3 Blog | [simular.ai/articles/agent-s3](https://www.simular.ai/articles/agent-s3) |
|
|
| 🎥 S3 Video | [YouTube](https://www.youtube.com/watch?v=VHr0a3UBsh4) |
|
|
| 📦 PyPI | [gui-agents](https://pypi.org/project/gui-agents/) |
|
|
| 🤗 Grounding Model | [UI-TARS-1.5-7B](https://huggingface.co/ByteDance-Seed/UI-TARS-1.5-7B) |
|
|
|
|
### 🗂 Source Code Guide
|
|
|
|
```
|
|
gui_agents/
|
|
├── s3/ # ⭐ Latest version — start here
|
|
│ ├── cli_app.py # ⭐ Entry point — CLI application, main loop
|
|
│ ├── agents/ # ⭐ Agent implementations (planning, grounding, execution)
|
|
│ ├── core/ # ⭐ Core abstractions (screenshot, actions, state)
|
|
│ ├── bbon/ # Behavior Best-of-N — sampling strategy for better performance
|
|
│ └── prompts/ # System prompts for each agent role
|
|
├── s2/ # Previous version
|
|
├── s2_5/ # Intermediate version
|
|
├── s1/ # Original version (ICLR 2025)
|
|
└── utils.py # Shared utilities
|
|
```
|
|
|
|
> **💡 Tip:** Focus entirely on `gui_agents/s3/`. Read the papers' system diagrams first, THEN the code. The code makes 10x more sense with the paper's architecture diagram in front of you.
|
|
|
|
> **⚠️ Setup Note:** Agent-S requires a grounding model (UI-TARS-1.5-7B). You can host it on Hugging Face Inference Endpoints (~$1-2/hr for A10G), use a free tier if available, or run it locally if you have a capable GPU (16GB+ VRAM). Alternatively, study the code architecture without running the full system.
|
|
|
|
---
|
|
|
|
### Day 1 (Monday): Architecture Deep Dive
|
|
|
|
**Read:**
|
|
- [ ] The full [README](https://github.com/simular-ai/Agent-S)
|
|
- [ ] [S3 blog post](https://www.simular.ai/articles/agent-s3) — accessible overview
|
|
- [ ] **S1 Paper** (at least abstract + Sections 1-3) — core architecture concepts
|
|
- [ ] **S3 Paper** (abstract + architecture section) — latest improvements
|
|
- [ ] `models.md` in the repo — supported model configurations
|
|
|
|
**Identify core abstractions:**
|
|
- **Screenshot Capture** — the agent "sees" the screen as an image
|
|
- **Grounding Model** (UI-TARS) — converts screenshots to UI element locations
|
|
- **Planning Agent** — decides what to do based on current screen + goal
|
|
- **Execution Agent** — translates plans into mouse/keyboard actions
|
|
- **Behavior Best-of-N (bBoN)** — run multiple rollouts, pick the best
|
|
|
|
**The pipeline:**
|
|
```
|
|
Task → Screenshot → Grounding (UI-TARS: identify elements) → Planning (LLM: what to do) → Action (click/type/scroll) → New Screenshot → Loop
|
|
```
|
|
|
|
- [ ] **📝 Homework:** Write architecture summary at `~/agent-study/notes/week3-architecture.md`
|
|
- Include the screenshot→grounding→planning→action pipeline
|
|
- Explain bBoN and why it matters (72.6% vs 66% on OSWorld)
|
|
- Compare: how is "seeing" a screen different from "calling" an API?
|
|
|
|
---
|
|
|
|
### Day 2 (Tuesday): Hello World + Core Concepts
|
|
|
|
**Setup:**
|
|
```bash
|
|
cd ~/agent-study/week3-agent-s
|
|
python -m venv .venv && source .venv/bin/activate
|
|
pip install gui-agents
|
|
brew install tesseract # Required dependency
|
|
```
|
|
|
|
**API configuration:**
|
|
```bash
|
|
export OPENAI_API_KEY=<your-key>
|
|
export ANTHROPIC_API_KEY=<your-key>
|
|
export HF_TOKEN=<your-huggingface-token>
|
|
```
|
|
|
|
**Run Agent-S3 (if you have grounding model access):**
|
|
```bash
|
|
agent_s \
|
|
--provider openai \
|
|
--model gpt-4o \
|
|
--ground_provider huggingface \
|
|
--ground_url <your-endpoint-url> \
|
|
--ground_model ui-tars-1.5-7b \
|
|
--grounding_width 1920 \
|
|
--grounding_height 1080
|
|
```
|
|
|
|
> **If you can't run it:** Read through `gui_agents/s3/cli_app.py` line by line and trace the execution flow. Understand what WOULD happen at each step.
|
|
|
|
- [ ] **📝 Homework:** Even if you can't run the full agent, build a minimal screenshot → analysis script:
|
|
```python
|
|
# Take a screenshot, send it to a vision model, get a description of UI elements
|
|
# This exercises the same "visual grounding" concept, just simplified
|
|
```
|
|
- Save at `~/agent-study/week3-agent-s/hello_agent.py`
|
|
|
|
---
|
|
|
|
### Day 3 (Wednesday): Intermediate Build — Understanding Computer Use
|
|
|
|
**Work through:**
|
|
- [ ] Read `gui_agents/s3/agents/` — understand the multi-agent architecture
|
|
- [ ] Read `gui_agents/s3/core/` — how screenshots are captured and actions are executed
|
|
- [ ] Study the prompt templates in `gui_agents/s3/` — how the LLM is instructed
|
|
- [ ] Understand the bBoN strategy in `gui_agents/s3/bbon/`
|
|
|
|
**Key concepts:**
|
|
- How screenshots are processed and annotated for the LLM
|
|
- How the grounding model converts visual elements to coordinates
|
|
- How actions (click, type, scroll) are executed on the OS level
|
|
- Cross-platform differences (Linux/Mac/Windows)
|
|
- The local coding environment feature
|
|
|
|
- [ ] **📝 Homework:** Build something that uses the computer-use paradigm:
|
|
- **Option A (with GPU):** Give Agent-S a simple task (open a browser, search for something, copy a result)
|
|
- **Option B (without GPU):** Build a simplified "screen reader" agent that takes a screenshot, uses a vision model to understand the UI, and outputs a structured description of what's on screen + suggested next actions
|
|
- Save at `~/agent-study/week3-agent-s/computer_use_demo/`
|
|
|
|
---
|
|
|
|
### Day 4 (Thursday): Advanced Patterns + Source Code Reading
|
|
|
|
**Read these source files (in order):**
|
|
1. `gui_agents/s3/cli_app.py` — Main entry point, execution loop
|
|
2. `gui_agents/s3/agents/` — Each agent role (planner, executor, grounding)
|
|
3. `gui_agents/s3/core/` — Screenshot capture, action execution, state management
|
|
4. `gui_agents/s3/bbon/` — Behavior Best-of-N implementation
|
|
5. `gui_agents/s1/` (briefly) — Compare S1 architecture to S3 to see evolution
|
|
|
|
**Explore the papers' techniques:**
|
|
- [ ] How does "experience-augmented hierarchical planning" work? (S1)
|
|
- [ ] What's the "Mixture of Grounding" approach? (S2)
|
|
- [ ] How does S3 achieve simplicity while improving performance?
|
|
|
|
- [ ] **📝 Homework:** Write "What I'd Steal from Agent-S" at `~/agent-study/notes/week3-steal.md`
|
|
- Focus on: The screenshot→grounding→action pipeline, bBoN strategy, cross-platform abstractions
|
|
- Think about: Could you add computer-use capabilities to a Pydantic-AI agent as a tool?
|
|
|
|
---
|
|
|
|
### Day 5 (Friday): Integration Project + Reflection
|
|
|
|
- [ ] **Build a mini-project:**
|
|
- **Suggested:** A "screen monitoring" agent that periodically screenshots your desktop, uses a vision model to understand what's happening, and logs structured summaries (using Pydantic-AI for the structured output!)
|
|
- **Alternative:** Build a browser automation agent using Playwright + vision model (a simplified version of Agent-S's approach)
|
|
- Save at `~/agent-study/week3-agent-s/integration_project/`
|
|
|
|
- [ ] **Write retrospective** at `~/agent-study/notes/week3-retro.md`
|
|
- [ ] **Update comparison matrix**
|
|
|
|
### 🎯 Key Questions:
|
|
|
|
1. What is the screenshot → grounding → action pipeline and why is it powerful?
|
|
2. Why does Agent-S need a separate grounding model (UI-TARS) in addition to the planning LLM?
|
|
3. What is Behavior Best-of-N and how does it improve performance by ~6%?
|
|
4. How is computer-use fundamentally different from API-based agent frameworks?
|
|
5. What are the security implications of an agent that can control your mouse and keyboard?
|
|
6. What's the difference between Agent-S's approach and Anthropic's Computer Use or OpenAI's Operator?
|
|
7. When would you use computer-use agents vs. API-based agents? Give 3 examples of each.
|
|
|
|
---
|
|
|
|
## Week 4: GPT Researcher
|
|
|
|
> **Difficulty:** ⭐⭐ (Straightforward architecture, well-documented, familiar patterns)
|
|
> **Repo:** [github.com/assafelovic/gpt-researcher](https://github.com/assafelovic/gpt-researcher)
|
|
> **Stars:** 25k | **Language:** Python
|
|
|
|
### Why This Is Week 4
|
|
|
|
After 3 weeks of studying *how* agents work internally, this week is about studying a *complete, purpose-built* agent that does one thing extremely well: research. GPT Researcher is the best example of the "Plan-and-Solve + RAG" pattern — a design you'll reuse in your own projects.
|
|
|
|
### Resources
|
|
|
|
| Resource | Link |
|
|
|----------|------|
|
|
| 📖 Documentation | [docs.gptr.dev](https://docs.gptr.dev/docs/gpt-researcher/getting-started) |
|
|
| 💬 Discord | [Discord](https://discord.gg/QgZXvJAccX) |
|
|
| 📦 PyPI | [gpt-researcher](https://pypi.org/project/gpt-researcher/) |
|
|
| 📝 Blog: How it was built | [docs.gptr.dev/blog](https://docs.gptr.dev/blog/building-gpt-researcher) |
|
|
| 🎥 Demo | [YouTube](https://www.youtube.com/watch?v=f60rlc_QCxE) |
|
|
| 🔧 MCP Integration | [MCP Guide](https://docs.gptr.dev/docs/gpt-researcher/retrievers/mcp-configs) |
|
|
| 📜 Plan-and-Solve Paper | [arxiv.org/abs/2305.04091](https://arxiv.org/abs/2305.04091) |
|
|
|
|
### 🗂 Source Code Guide
|
|
|
|
```
|
|
gpt_researcher/
|
|
├── agent.py # ⭐ THE file. GPTResearcher class — the entire research orchestration
|
|
├── actions/ # ⭐ Research actions (generate questions, search, scrape, synthesize)
|
|
│ ├── query_processing.py # How research questions are generated from the user query
|
|
│ ├── web_search.py # Web search execution
|
|
│ └── report_generation.py # Final report synthesis
|
|
├── config/ # Configuration management
|
|
│ └── config.py # All configurable parameters
|
|
├── context/ # ⭐ Context management — how gathered info is stored/retrieved
|
|
│ └── compression.py # How context is compressed to fit token limits
|
|
├── document/ # Document processing (PDF, web pages, etc.)
|
|
├── memory/ # ⭐ Research memory — how the agent remembers what it's found
|
|
├── orchestrator/ # ⭐ Deep research — recursive tree exploration
|
|
│ └── agent/ # Sub-agents for deep research mode
|
|
├── retrievers/ # ⭐ Web/local search implementations (Tavily, DuckDuckGo, MCP, etc.)
|
|
└── scraper/ # Web scraping implementations
|
|
```
|
|
|
|
> **💡 Tip:** `agent.py` is the heart. It's one file, ~700 lines, and it contains the entire research orchestration. Read it top to bottom. Then read `actions/` to understand each step.
|
|
|
|
---
|
|
|
|
### Day 1 (Monday): Architecture Deep Dive
|
|
|
|
**Read:**
|
|
- [ ] Full [README](https://github.com/assafelovic/gpt-researcher)
|
|
- [ ] [How it was built](https://docs.gptr.dev/blog/building-gpt-researcher) — the design blog post
|
|
- [ ] [Getting Started](https://docs.gptr.dev/docs/gpt-researcher/getting-started)
|
|
- [ ] [Customization docs](https://docs.gptr.dev/docs/gpt-researcher/gptr/config)
|
|
|
|
**Understand the Plan-and-Solve architecture:**
|
|
```
|
|
User Query
|
|
→ Planner Agent: Generate N research questions
|
|
→ For each question:
|
|
→ Crawler Agent: Search web, gather sources
|
|
→ Summarizer: Extract relevant info from each source
|
|
→ Source tracker: Track citations
|
|
→ Publisher Agent: Aggregate all findings into a report
|
|
```
|
|
|
|
**Deep Research mode adds recursion:**
|
|
```
|
|
User Query → Generate sub-topics → For each sub-topic → Generate deeper sub-topics → ... → Aggregate bottom-up
|
|
```
|
|
|
|
- [ ] **📝 Homework:** Write architecture summary at `~/agent-study/notes/week4-architecture.md`
|
|
|
|
---
|
|
|
|
### Day 2 (Tuesday): Hello World + Core Concepts
|
|
|
|
**Setup:**
|
|
```bash
|
|
cd ~/agent-study/week4-gpt-researcher
|
|
python -m venv .venv && source .venv/bin/activate
|
|
pip install gpt-researcher
|
|
|
|
# Set required API keys
|
|
export OPENAI_API_KEY=<your-key>
|
|
export TAVILY_API_KEY=<your-key>
|
|
```
|
|
|
|
**Run the simplest version:**
|
|
```python
|
|
from gpt_researcher import GPTResearcher
|
|
import asyncio
|
|
|
|
async def main():
|
|
query = "What are the latest advancements in AI agent frameworks in 2025-2026?"
|
|
researcher = GPTResearcher(query=query)
|
|
research_result = await researcher.conduct_research()
|
|
report = await researcher.write_report()
|
|
print(report)
|
|
|
|
asyncio.run(main())
|
|
```
|
|
|
|
**Also try the web UI:**
|
|
```bash
|
|
git clone https://github.com/assafelovic/gpt-researcher.git
|
|
cd gpt-researcher
|
|
pip install -r requirements.txt
|
|
python -m uvicorn main:app --reload
|
|
# Visit http://localhost:8000
|
|
```
|
|
|
|
- [ ] **📝 Homework:** Build a minimal research agent from scratch — NO copy-paste
|
|
- Save at `~/agent-study/week4-gpt-researcher/hello_researcher.py`
|
|
|
|
---
|
|
|
|
### Day 3 (Wednesday): Intermediate Build — Deep Research + MCP
|
|
|
|
**Focus: GPT Researcher's key differentiators — Deep Research mode and MCP integration**
|
|
|
|
**Work through:**
|
|
- [ ] [Deep Research docs](https://docs.gptr.dev/docs/gpt-researcher/gptr/deep-research)
|
|
- [ ] [MCP Integration Guide](https://docs.gptr.dev/docs/gpt-researcher/retrievers/mcp-configs)
|
|
- [ ] [Local document research](https://docs.gptr.dev/docs/gpt-researcher/gptr/local-docs)
|
|
- [ ] Run a Deep Research query and observe the recursive tree exploration
|
|
|
|
**Key concepts:**
|
|
- How Deep Research recursively explores sub-topics
|
|
- How MCP connects GPT Researcher to external data sources
|
|
- How context compression prevents token limit issues
|
|
- How source tracking and citations work
|
|
- The difference between web research and local document research
|
|
|
|
- [ ] **📝 Homework:** Build a research agent that uses GPT Researcher's unique capabilities:
|
|
- **Must include:** MCP integration with at least one external source (e.g., GitHub MCP server)
|
|
- **OR:** Research over local documents (PDFs, markdown files from your study notes)
|
|
- **Bonus:** Use Deep Research mode for a complex topic
|
|
- Save at `~/agent-study/week4-gpt-researcher/deep_research_demo.py`
|
|
|
|
---
|
|
|
|
### Day 4 (Thursday): Advanced Patterns + Source Code Reading
|
|
|
|
**Read these source files (in order):**
|
|
1. `gpt_researcher/agent.py` — The entire GPTResearcher class, top to bottom
|
|
2. `gpt_researcher/actions/query_processing.py` — How research questions are generated
|
|
3. `gpt_researcher/context/compression.py` — How context is managed within token limits
|
|
4. `gpt_researcher/orchestrator/` — Deep research recursive tree implementation
|
|
5. `gpt_researcher/retrievers/` — How different search providers are integrated
|
|
|
|
**Understand:**
|
|
- [ ] How the planner decomposes a query into research questions
|
|
- [ ] How the agent handles rate limiting and API failures
|
|
- [ ] How context compression works (this is critical for long research)
|
|
- [ ] How the orchestrator manages the recursive tree in Deep Research mode
|
|
- [ ] How the report generator synthesizes multiple sources into a coherent report
|
|
|
|
- [ ] **📝 Homework:** Write "What I'd Steal from GPT Researcher" at `~/agent-study/notes/week4-steal.md`
|
|
- Focus on: Plan-and-Solve decomposition, context compression, source tracking, recursive exploration
|
|
- Compare: how would you build "deep research" capability into a Pydantic-AI agent?
|
|
|
|
---
|
|
|
|
### Day 5 (Friday): Integration Project + Reflection
|
|
|
|
- [ ] **Build a mini-project:**
|
|
- **Suggested:** A "competitive analysis" agent — given a company/product, it researches competitors, pricing, features, and generates a structured comparison report. Use GPT Researcher's engine + Pydantic-AI for structured output.
|
|
- **Alternative:** Install GPT Researcher as a [Claude Skill](https://skills.sh/assafelovic/gpt-researcher/gpt-researcher) and use it in your Claude workflow
|
|
- Save at `~/agent-study/week4-gpt-researcher/integration_project/`
|
|
|
|
- [ ] **Write retrospective** at `~/agent-study/notes/week4-retro.md`
|
|
- [ ] **Update comparison matrix**
|
|
|
|
### 🎯 Key Questions:
|
|
|
|
1. What is the Plan-and-Solve pattern and how does GPT Researcher implement it?
|
|
2. How does Deep Research differ from regular research? Draw the tree structure.
|
|
3. How does context compression prevent token limit issues during long research?
|
|
4. How does GPT Researcher track and cite sources?
|
|
5. What search providers does GPT Researcher support and how do you add a new one?
|
|
6. How could you combine GPT Researcher with Pydantic-AI for structured research outputs?
|
|
7. What are the limitations of automated research (hallucination, bias, recency)?
|
|
|
|
---
|
|
|
|
## Week 5: Yao
|
|
|
|
> **Difficulty:** ⭐⭐⭐⭐ (Go language, novel architecture, less documentation, paradigm shift)
|
|
> **Repo:** [github.com/YaoApp/yao](https://github.com/YaoApp/yao)
|
|
> **Stars:** 7.5k | **Language:** Go | **Runtime:** Single binary with V8 engine
|
|
|
|
### Why This Is Week 5
|
|
|
|
Yao is the most architecturally unique repo in the entire study. It's not a chatbot framework — it's an **autonomous agent engine** where agents are triggered by events, schedules, and emails. This is the only Go-based framework, the only one with event-driven architecture, and the only one that deploys as a single binary. If everything else is "AI assistant," Yao is "AI team member."
|
|
|
|
> **⚠️ Language Note:** This week requires Go. If you don't know Go, spend an extra hour on Day 1 doing the [Go Tour](https://go.dev/tour/). You don't need to be fluent — just enough to read the source code.
|
|
|
|
### Resources
|
|
|
|
| Resource | Link |
|
|
|----------|------|
|
|
| 🏠 Homepage | [yaoapps.com](https://yaoapps.com) |
|
|
| 📖 Documentation | [yaoapps.com/docs](https://yaoapps.com/docs) |
|
|
| 🚀 Quick Start | [Getting Started](https://yaoapps.com/docs/documentation/en-us/getting-started) |
|
|
| ✨ Why Yao? | [Why Yao](https://yaoapps.com/docs/documentation/en-us/getting-started/why-yao) |
|
|
| 🤖 Agent Examples | [YaoAgents/awesome](https://github.com/YaoAgents/awesome) |
|
|
| 📦 Install Script | `curl -fsSL https://yaoapps.com/install.sh \| bash` |
|
|
| 🐹 Go Tour (if needed) | [go.dev/tour](https://go.dev/tour/) |
|
|
|
|
### 🗂 Source Code Guide
|
|
|
|
```
|
|
yao/
|
|
├── engine/
|
|
│ └── process.go # ⭐ Process engine — core concept in Yao
|
|
├── agent/ # ⭐ Agent framework — autonomous agent definitions
|
|
│ ├── agent.go # Agent lifecycle, trigger modes, execution phases
|
|
│ └── triggers/ # Clock, Human, Event trigger implementations
|
|
├── runtime/
|
|
│ └── v8/ # ⭐ Built-in V8 JavaScript/TypeScript engine
|
|
├── rag/
|
|
│ └── graph/ # ⭐ Built-in GraphRAG implementation
|
|
├── mcp/ # MCP integration
|
|
├── api/ # HTTP server and REST API
|
|
├── model/ # ORM and database layer
|
|
└── cmd/
|
|
└── yao/
|
|
└── main.go # Application entry point
|
|
```
|
|
|
|
> **💡 Tip:** Yao's DSL-based approach means you'll be reading `.yao` files (YAML-like definitions) as much as Go source code. The mental model is: you define agents as data (DSL), and the engine executes them.
|
|
|
|
---
|
|
|
|
### Day 1 (Monday): Architecture Deep Dive
|
|
|
|
**Read:**
|
|
- [ ] Full [README](https://github.com/YaoApp/yao)
|
|
- [ ] [Why Yao?](https://yaoapps.com/docs/documentation/en-us/getting-started/why-yao)
|
|
- [ ] [Documentation overview](https://yaoapps.com/docs)
|
|
- [ ] Skim the Go source: `cmd/yao/main.go` → `engine/process.go` → `agent/agent.go`
|
|
|
|
**Understand Yao's radical differences:**
|
|
|
|
| Traditional Agent | Yao Agent |
|
|
|-------------------|-----------|
|
|
| Entry point: chatbox | Entry point: email, events, schedules |
|
|
| Passive: you ask, it answers | Proactive: it works autonomously |
|
|
| Role: tool | Role: team member |
|
|
|
|
**The six-phase execution model:**
|
|
```
|
|
Inspiration → Goals → Tasks → Run → Deliver → Learn
|
|
```
|
|
|
|
**Three trigger modes:**
|
|
1. **Clock** — scheduled tasks (cron-like)
|
|
2. **Human** — triggered by email or messages
|
|
3. **Event** — triggered by webhooks or database changes
|
|
|
|
- [ ] **📝 Homework:** Write architecture summary at `~/agent-study/notes/week5-architecture.md`
|
|
- Focus on: How the event-driven model is fundamentally different from request-response
|
|
- Compare: 6-phase execution vs Pydantic-AI's run loop vs MS Agent Framework's graph
|
|
|
|
---
|
|
|
|
### Day 2 (Tuesday): Hello World + Core Concepts
|
|
|
|
**Setup:**
|
|
```bash
|
|
# Install Yao (single binary!)
|
|
curl -fsSL https://yaoapps.com/install.sh | bash
|
|
|
|
# Create a project
|
|
cd ~/agent-study/week5-yao
|
|
mkdir project && cd project
|
|
yao start # First run creates project structure
|
|
# Visit http://127.0.0.1:5099
|
|
```
|
|
|
|
**Run your first process:**
|
|
```bash
|
|
yao run utils.app.Ping # Returns version
|
|
yao run scripts.tests.Hello 'Hello, Yao!' # Run TypeScript
|
|
yao run models.tests.pet.Find 1 '::{}' # Query database
|
|
```
|
|
|
|
**Understand core concepts:**
|
|
- [ ] **Processes** — functions that can be run directly or referenced in code
|
|
- [ ] **Models** — database models defined in `.mod.yao` files
|
|
- [ ] **Scripts** — TypeScript/JavaScript code executed by the built-in V8 engine
|
|
- [ ] **DSL** — Yao's declarative syntax for defining everything
|
|
|
|
- [ ] **📝 Homework:** Build the simplest Yao application from scratch:
|
|
- Define a model, write a process, create a simple API endpoint
|
|
- Save project at `~/agent-study/week5-yao/hello_project/`
|
|
|
|
---
|
|
|
|
### Day 3 (Wednesday): Intermediate Build — Event-Driven Agents
|
|
|
|
**Focus: What makes Yao unique — event-driven, proactive agents**
|
|
|
|
**Work through:**
|
|
- [ ] Agent configuration — defining agents with roles and triggers
|
|
- [ ] Setting up a scheduled (Clock) trigger
|
|
- [ ] Setting up an Event trigger (webhook → agent action)
|
|
- [ ] MCP integration — connecting external tools
|
|
- [ ] GraphRAG — how the built-in knowledge graph works
|
|
|
|
**Key concepts:**
|
|
- How agents are defined declaratively (vs. programmatically in Python frameworks)
|
|
- How the three trigger modes work in practice
|
|
- How agents learn from past executions (the "Learn" phase)
|
|
- How GraphRAG combines vector search with graph traversal
|
|
- Why a single binary matters for deployment
|
|
|
|
- [ ] **📝 Homework:** Build an event-driven agent:
|
|
- **Must include:** At least 2 different trigger modes (e.g., Clock + Event)
|
|
- **Must include:** An agent that does something proactively (not just responding to a chat)
|
|
- Example idea: An agent that checks an RSS feed on a schedule (Clock), processes new articles (Run), and stores summaries in the knowledge base (Learn/Deliver)
|
|
- Save at `~/agent-study/week5-yao/event_agent/`
|
|
|
|
---
|
|
|
|
### Day 4 (Thursday): Advanced Patterns + Source Code Reading
|
|
|
|
**Read these source files (in order):**
|
|
1. `cmd/yao/main.go` — Application entry point, how the single binary initializes
|
|
2. `engine/process.go` — The process engine (core execution abstraction)
|
|
3. `agent/agent.go` — Agent lifecycle and execution phases
|
|
4. `runtime/v8/` — How the V8 engine is embedded for TypeScript support
|
|
5. `rag/graph/` — GraphRAG implementation (vector + graph hybrid search)
|
|
|
|
**Understand:**
|
|
- [ ] How Go's concurrency model (goroutines) enables event-driven agents
|
|
- [ ] How the V8 engine is embedded and used for TypeScript execution
|
|
- [ ] How GraphRAG combines embedding search with entity-relationship traversal
|
|
- [ ] How a single Go binary includes all these features without external dependencies
|
|
|
|
- [ ] **📝 Homework:** Write "What I'd Steal from Yao" at `~/agent-study/notes/week5-steal.md`
|
|
- Focus on: Event-driven architecture, single binary deployment, GraphRAG, DSL approach
|
|
- Think about: Could you add event-driven capabilities to a Python agent framework?
|
|
|
|
---
|
|
|
|
### Day 5 (Friday): Integration Project + Reflection
|
|
|
|
- [ ] **Build a mini-project:**
|
|
- **Suggested:** A "daily briefing" agent — schedule it to run every morning, have it gather data from APIs (weather, calendar, news), process it, and output a structured briefing. Use the Clock trigger + MCP for external data.
|
|
- **Alternative:** Build a webhook-triggered agent that processes incoming data and stores it in GraphRAG
|
|
- Save at `~/agent-study/week5-yao/integration_project/`
|
|
|
|
- [ ] **Write retrospective** at `~/agent-study/notes/week5-retro.md`
|
|
- [ ] **Update comparison matrix**
|
|
|
|
### 🎯 Key Questions:
|
|
|
|
1. How does Yao's event-driven model differ from the request-response model of every other framework?
|
|
2. What are the three trigger modes and when would you use each?
|
|
3. What is the six-phase execution model and how does the "Learn" phase create a feedback loop?
|
|
4. Why is single-binary deployment a significant advantage? Where would you deploy Yao that you couldn't deploy Python frameworks?
|
|
5. How does Yao's built-in GraphRAG differ from vector-only RAG?
|
|
6. What does it mean that Yao embeds a V8 engine? What are the implications for extensibility?
|
|
7. What types of applications is Yao best suited for vs. worst suited for?
|
|
|
|
---
|
|
|
|
## Week 6: MetaGPT
|
|
|
|
> **Difficulty:** ⭐⭐⭐ (Large codebase, academic concepts, multi-agent complexity)
|
|
> **Repo:** [github.com/FoundationAgents/MetaGPT](https://github.com/FoundationAgents/MetaGPT)
|
|
> **Stars:** 63k | **Language:** Python | **Papers:** ICLR 2024 + many more
|
|
|
|
### Why This Is Week 6
|
|
|
|
MetaGPT is the OG multi-agent framework and the capstone of your study. It introduces Standard Operating Procedures (SOPs) as the coordination mechanism — a genuinely novel idea that maps human organizational structures onto AI agents. By Week 6, you have enough context from the previous 5 frameworks to deeply appreciate what MetaGPT does differently.
|
|
|
|
### Resources
|
|
|
|
| Resource | Link |
|
|
|----------|------|
|
|
| 📖 Documentation | [docs.deepwisdom.ai](https://docs.deepwisdom.ai/main/en/) |
|
|
| 💬 Discord | [Discord](https://discord.gg/ZRHeExS6xv) |
|
|
| 📦 PyPI | [metagpt](https://pypi.org/project/metagpt/) |
|
|
| 🎯 MGX (commercial product) | [mgx.dev](https://mgx.dev/) |
|
|
| 📄 MetaGPT Paper (ICLR 2024) | [openreview.net](https://openreview.net/forum?id=VtmBAGCN7o) |
|
|
| 📄 AFlow Paper (ICLR 2025 Oral) | [openreview.net](https://openreview.net/forum?id=z5uVAKwmjf) |
|
|
| 📝 Agent 101 Tutorial | [Agent 101](https://docs.deepwisdom.ai/main/en/guide/tutorials/agent_101.html) |
|
|
| 📝 MultiAgent 101 | [MultiAgent 101](https://docs.deepwisdom.ai/main/en/guide/tutorials/multi_agent_101.html) |
|
|
| 🤗 HuggingFace Demo | [MetaGPT Space](https://huggingface.co/spaces/deepwisdom/MetaGPT-SoftwareCompany) |
|
|
|
|
### 🗂 Source Code Guide
|
|
|
|
```
|
|
metagpt/
|
|
├── roles/ # ⭐ Role definitions — each role = one agent with a job
|
|
│ ├── role.py # ⭐ Base Role class — THE core abstraction
|
|
│ ├── architect.py # Software architect agent
|
|
│ ├── engineer.py # Software engineer agent
|
|
│ ├── product_manager.py # Product manager agent
|
|
│ ├── project_manager.py # Project manager agent
|
|
│ └── di/
|
|
│ └── data_interpreter.py # Data analysis agent
|
|
├── actions/ # ⭐ Action definitions — what roles can do
|
|
│ ├── action.py # Base Action class
|
|
│ ├── write_prd.py # Write Product Requirements Document
|
|
│ ├── write_design.py # Write system design
|
|
│ └── write_code.py # Write code
|
|
├── team.py # ⭐ Team orchestration — how roles collaborate via SOPs
|
|
├── environment.py # ⭐ Shared environment — message passing between roles
|
|
├── schema.py # Message schemas for inter-role communication
|
|
├── config2.py # Configuration management
|
|
├── base/ # Base classes and utilities
|
|
├── memory/ # Memory management for roles
|
|
├── software_company.py # ⭐ The "software company" end-to-end pipeline
|
|
└── utils/
|
|
└── project_repo.py # Project repository management
|
|
```
|
|
|
|
> **💡 Tip:** The mental model is: **Role** (who) performs **Actions** (what) according to **SOPs** (how). Read `roles/role.py` first, then `actions/action.py`, then `team.py`. That's the holy trinity of MetaGPT.
|
|
|
|
---
|
|
|
|
### Day 1 (Monday): Architecture Deep Dive
|
|
|
|
**Read:**
|
|
- [ ] Full [README](https://github.com/FoundationAgents/MetaGPT)
|
|
- [ ] [Agent 101 Tutorial](https://docs.deepwisdom.ai/main/en/guide/tutorials/agent_101.html)
|
|
- [ ] [MultiAgent 101 Tutorial](https://docs.deepwisdom.ai/main/en/guide/tutorials/multi_agent_101.html)
|
|
- [ ] MetaGPT paper (abstract + Sections 1-3) — the SOP concept
|
|
- [ ] Skim the [AFlow paper](https://openreview.net/forum?id=z5uVAKwmjf) abstract — automated workflow generation
|
|
|
|
**Core philosophy:** `Code = SOP(Team)`
|
|
|
|
**Identify core abstractions:**
|
|
- **Role** — an agent with a specific job (PM, architect, engineer, etc.)
|
|
- **Action** — a discrete task a role can perform (write PRD, write code, etc.)
|
|
- **SOP** — Standard Operating Procedures that define the workflow between roles
|
|
- **Team** — the orchestrator that manages roles and message passing
|
|
- **Environment** — shared context where roles publish and subscribe to messages
|
|
- **Message** — typed communication between roles
|
|
|
|
**The "software company" pipeline:**
|
|
```
|
|
User Requirement
|
|
→ Product Manager (writes PRD)
|
|
→ Architect (writes system design)
|
|
→ Project Manager (creates task breakdown)
|
|
→ Engineer (writes code)
|
|
→ QA (tests code)
|
|
```
|
|
|
|
- [ ] **📝 Homework:** Write architecture summary at `~/agent-study/notes/week6-architecture.md`
|
|
- Explain the SOP model and how it maps to human organizations
|
|
- Compare: SOP coordination vs Graph workflows (MS) vs Event-driven (Yao) vs Linear (Pydantic-AI)
|
|
|
|
---
|
|
|
|
### Day 2 (Tuesday): Hello World + Core Concepts
|
|
|
|
**Setup:**
|
|
```bash
|
|
cd ~/agent-study/week6-metagpt
|
|
conda create -n metagpt python=3.11 && conda activate metagpt
|
|
pip install --upgrade metagpt
|
|
metagpt --init-config # Creates ~/.metagpt/config2.yaml
|
|
# Edit the config to add your API key
|
|
```
|
|
|
|
**Run the classic demo:**
|
|
```bash
|
|
metagpt "Create a snake game" # This will generate a full project in ./workspace
|
|
```
|
|
|
|
**Also try programmatically:**
|
|
```python
|
|
from metagpt.software_company import generate_repo
|
|
from metagpt.utils.project_repo import ProjectRepo
|
|
|
|
repo: ProjectRepo = generate_repo("Create a simple calculator app")
|
|
print(repo)
|
|
```
|
|
|
|
**And try the Data Interpreter:**
|
|
```python
|
|
import asyncio
|
|
from metagpt.roles.di.data_interpreter import DataInterpreter
|
|
|
|
async def main():
|
|
di = DataInterpreter()
|
|
await di.run("Run data analysis on sklearn Iris dataset, include a plot")
|
|
|
|
asyncio.run(main())
|
|
```
|
|
|
|
- [ ] **📝 Homework:** Build a custom role from scratch — NO copy-paste:
|
|
- Define a new `Role` subclass with custom `Action`s
|
|
- Example: a "ResearchAnalyst" role that takes a topic and produces a structured analysis
|
|
- Save at `~/agent-study/week6-metagpt/hello_role.py`
|
|
|
|
---
|
|
|
|
### Day 3 (Wednesday): Intermediate Build — Multi-Agent SOPs
|
|
|
|
**Focus: MetaGPT's unique capability — SOP-based multi-agent coordination**
|
|
|
|
**Work through:**
|
|
- [ ] [MultiAgent 101](https://docs.deepwisdom.ai/main/en/guide/tutorials/multi_agent_101.html)
|
|
- [ ] Look at the [Debate example](https://docs.deepwisdom.ai/main/en/guide/use_cases/multi_agent/debate.html)
|
|
- [ ] Understand how messages flow between roles via the Environment
|
|
- [ ] Understand how the SOP defines which role acts after which
|
|
|
|
**Key concepts:**
|
|
- How roles subscribe to message types from other roles
|
|
- How the Team orchestrator manages turn-taking
|
|
- How the Environment enables publish/subscribe communication
|
|
- How SOPs encode workflow logic without explicit graph definitions
|
|
- The difference between the "software company" SOP and custom SOPs
|
|
|
|
- [ ] **📝 Homework:** Build a multi-agent system with a custom SOP:
|
|
- **Must include:** At least 3 custom roles with different responsibilities
|
|
- **Must include:** Custom message types between roles
|
|
- **Must include:** A clear SOP workflow (Role A → Role B → Role C)
|
|
- Example idea: A "content creation team" — Researcher (gathers info) → Writer (drafts article) → Editor (reviews and improves) → Publisher (formats final output)
|
|
- Save at `~/agent-study/week6-metagpt/multi_agent_sop.py`
|
|
|
|
---
|
|
|
|
### Day 4 (Thursday): Advanced Patterns + Source Code Reading
|
|
|
|
**Read these source files (in order):**
|
|
1. `metagpt/roles/role.py` — Base Role class, how roles think and act
|
|
2. `metagpt/actions/action.py` — Base Action class, how actions execute
|
|
3. `metagpt/team.py` — Team orchestration, turn management
|
|
4. `metagpt/environment.py` — Message passing, pub/sub system
|
|
5. `metagpt/schema.py` — Message types and schemas
|
|
|
|
**Also explore:**
|
|
- [ ] `metagpt/roles/engineer.py` — how the Engineer role writes code (complex action chain)
|
|
- [ ] `metagpt/software_company.py` — the end-to-end pipeline
|
|
- [ ] `metagpt/memory/` — how roles maintain memory across turns
|
|
- [ ] `examples/` — AFlow and SPO implementations
|
|
|
|
**Advanced concepts:**
|
|
- [ ] How does AFlow (Automated Agentic Workflow Generation) work?
|
|
- [ ] What is SPO (Self-Play Optimization)?
|
|
- [ ] How does the Data Interpreter differ from the Software Company pipeline?
|
|
|
|
- [ ] **📝 Homework:** Write "What I'd Steal from MetaGPT" at `~/agent-study/notes/week6-steal.md`
|
|
- Focus on: SOP-based coordination, Role/Action abstraction, message-passing environment
|
|
- Reflect on: Which coordination model do you prefer? Graph (MS) vs SOP (MetaGPT) vs Event (Yao)?
|
|
|
|
---
|
|
|
|
### Day 5 (Friday): Integration Project + Final Reflection
|
|
|
|
- [ ] **Build a mini-project:**
|
|
- **Suggested:** A multi-agent system that takes a business idea and produces a full analysis: Market Researcher role → Business Analyst role → Financial Modeler role → Report Writer role. Each produces a structured output that feeds into the next.
|
|
- Save at `~/agent-study/week6-metagpt/integration_project/`
|
|
|
|
- [ ] **Write final retrospective** at `~/agent-study/notes/week6-retro.md`
|
|
- This one should be more comprehensive — reflect on ALL 6 weeks
|
|
- What framework would you reach for first? When?
|
|
- What surprised you most across the study?
|
|
|
|
- [ ] **Complete comparison matrix** — all 6 frameworks
|
|
- [ ] **Commit and push everything** to your study git repo
|
|
|
|
### 🎯 Key Questions:
|
|
|
|
1. What does "Code = SOP(Team)" mean concretely?
|
|
2. How does the Role/Action/SOP model map to real organizational structures?
|
|
3. How do messages flow between roles? What's the pub/sub mechanism?
|
|
4. What's the difference between MetaGPT's approach and MS Agent Framework's graph workflows?
|
|
5. How does the Data Interpreter feature differ from the Software Company pipeline?
|
|
6. What is AFlow and why was it accepted as an oral presentation at ICLR 2025?
|
|
7. When would you use MetaGPT vs simpler single-agent frameworks?
|
|
8. Across all 6 frameworks, which coordination model (linear/graph/SOP/event) do you think is most general?
|
|
|
|
---
|
|
|
|
## Week 7: ElizaOS
|
|
|
|
> **Timeline:** 1 week | **Difficulty:** ⭐⭐ | **Goal:** Learn agent deployment & multi-platform distribution
|
|
> **Repo:** [elizaOS/eliza](https://github.com/elizaOS/eliza) | ⭐ 17,476 | TypeScript
|
|
> **Why this week:** Weeks 1-6 taught you how to BUILD agents. This week teaches you how to DEPLOY them where users actually are.
|
|
|
|
### Why ElizaOS Makes The Cut
|
|
|
|
After a thorough debate (see the [deep dive analysis](./trending-repos-deep-dive.md)), ElizaOS earned its spot because:
|
|
- It's the **only deployment-focused platform** on the trending list — multi-platform routing (Discord, Telegram, Twitter, Farcaster) in one framework
|
|
- **17k stars** with active development and a large community
|
|
- The plugin architecture, character system, and platform adapters teach **real deployment patterns** you won't learn from any other framework studied
|
|
- Knowing how to ship agents to where users live is as important as knowing how to build them
|
|
|
|
### Resources
|
|
|
|
| Resource | URL |
|
|
|----------|-----|
|
|
| **GitHub** | https://github.com/elizaOS/eliza |
|
|
| **Docs** | https://elizaos.github.io/eliza/ |
|
|
| **Discord** | https://discord.gg/elizaos |
|
|
| **Quickstart** | https://elizaos.github.io/eliza/docs/quickstart |
|
|
|
|
### Key Source Files to Read
|
|
|
|
| File | Why It Matters |
|
|
|------|---------------|
|
|
| `packages/core/src/runtime.ts` | The AgentRuntime — the central brain that coordinates everything |
|
|
| `packages/core/src/types.ts` | All the core interfaces (Character, Memory, Action, Provider, Evaluator) |
|
|
| `packages/plugin-discord/src/index.ts` | How a platform adapter is built — the Discord integration |
|
|
| `packages/plugin-telegram/src/index.ts` | Compare with Discord adapter — spot the platform abstraction pattern |
|
|
| `packages/core/src/memory.ts` | Memory management — how agents maintain context across platforms |
|
|
| `agent/src/index.ts` | The entry point — how everything gets wired together |
|
|
|
|
---
|
|
|
|
### Day 1 (Monday): Architecture Deep Dive — The Deployment Platform
|
|
|
|
**Study (1-2 hrs):**
|
|
- Read the full README and quickstart docs
|
|
- Understand the core architecture:
|
|
- **Character files** — how agent personalities are defined (JSON-based)
|
|
- **AgentRuntime** — the central coordinator
|
|
- **Plugins** — how platform adapters, actions, and providers are registered
|
|
- **Actions vs Evaluators vs Providers** — the three extension points
|
|
- **Memory** — how conversation state persists across platforms
|
|
- Study the plugin system architecture — how does one agent connect to Discord AND Telegram simultaneously?
|
|
- Understand the character file format — what can you configure?
|
|
|
|
**Key Questions:**
|
|
- How does ElizaOS route a message from Discord to the right agent and back?
|
|
- What's the difference between an Action, an Evaluator, and a Provider?
|
|
- How does the memory system work across platforms? Can an agent remember a Discord convo when talking on Telegram?
|
|
- How does the character file influence agent behavior vs hard-coded logic?
|
|
|
|
**Homework:**
|
|
- [ ] Write a 1-page architecture summary covering: runtime → plugins → adapters → memory → character system
|
|
- [ ] Draw a diagram showing message flow: User sends Discord message → ... → Agent responds
|
|
- [ ] Compare the architecture to Pydantic-AI's approach — what's different about a "deployment-first" vs "logic-first" framework?
|
|
|
|
---
|
|
|
|
### Day 2 (Tuesday): Hello World — Deploy an Agent to Discord
|
|
|
|
**Study (1-2 hrs):**
|
|
- Set up the ElizaOS development environment
|
|
- Clone the repo, install deps (`pnpm install`)
|
|
- Create a Discord bot in the Discord Developer Portal (you'll need a test server)
|
|
- Set up your `.env` with Discord bot token and an LLM API key
|
|
- Create a custom character file for your agent:
|
|
- Define name, bio, personality traits, example conversations
|
|
- Set the model provider and platform connections
|
|
- Run the agent locally, verify it responds in Discord
|
|
|
|
**Homework:**
|
|
- [ ] Create a character file from scratch (no copy-paste from examples) — give it a distinct personality
|
|
- [ ] Deploy the agent to your Discord test server and have a 10-message conversation with it
|
|
- [ ] Screenshot the conversation and note: What worked? What felt off? How does character configuration affect responses?
|
|
|
|
---
|
|
|
|
### Day 3 (Wednesday): Multi-Platform + Plugin System
|
|
|
|
**Study (1-2 hrs):**
|
|
- Add a second platform — connect the same agent to Telegram (or Twitter)
|
|
- Same character, same agent, two platforms simultaneously
|
|
- Observe: does memory carry across? How does the agent handle platform-specific features?
|
|
- Study the plugin architecture:
|
|
- Read how `plugin-discord` and `plugin-telegram` are structured
|
|
- Understand the `Plugin` interface — what does a plugin provide?
|
|
- Look at how Actions work — these are the agent's "tools"
|
|
- Write a custom Action plugin:
|
|
- Something simple: a weather lookup, a file reader, or a joke generator
|
|
- Register it and verify your agent can use it on both platforms
|
|
|
|
**Homework:**
|
|
- [ ] Run your agent on 2 platforms simultaneously — screenshot both conversations
|
|
- [ ] Build a custom Action plugin from scratch and verify it works
|
|
- [ ] Write a comparison: how does ElizaOS's plugin system compare to Pydantic-AI's tool system and MetaGPT's action system? What are the trade-offs?
|
|
|
|
---
|
|
|
|
### Day 4 (Thursday): Source Code Reading + Advanced Patterns
|
|
|
|
**Study (1-2 hrs):**
|
|
- Read the key source files from the table above, focusing on:
|
|
- **runtime.ts** — How does the AgentRuntime process an incoming message? What's the evaluation pipeline?
|
|
- **types.ts** — What are all the interfaces? How extensible is the system?
|
|
- **memory.ts** — How is conversation history stored and retrieved? What's the embedding strategy?
|
|
- Study advanced patterns:
|
|
- Multi-agent setups — can you run multiple agents with different characters?
|
|
- Custom evaluators — how do you add post-processing logic?
|
|
- Custom providers — how do you inject context into every agent response?
|
|
- Compare deployment architecture decisions:
|
|
- How does ElizaOS handle rate limiting across platforms?
|
|
- How does it handle platform-specific message formatting (embeds, buttons, etc.)?
|
|
- What's the error handling strategy when a platform adapter fails?
|
|
|
|
**Homework:**
|
|
- [ ] Write a "What I'd Steal From ElizaOS" doc — which patterns are worth using in your own projects? Think:
|
|
- Character file abstraction for agent personality
|
|
- Plugin registration pattern
|
|
- Platform adapter interface
|
|
- Memory routing across services
|
|
- [ ] Identify the 3 biggest architectural weaknesses (every framework has them)
|
|
|
|
---
|
|
|
|
### Day 5 (Friday): Integration Project — Deploy a Week 1-6 Agent
|
|
|
|
**The real test:** Take an agent you built in Weeks 1-6 and deploy it to at least one chat platform using patterns learned from ElizaOS.
|
|
|
|
**Options (pick one):**
|
|
1. **Pydantic-AI agent → Discord:** Take your structured-output agent from Week 1 and wrap it in a Discord bot using ElizaOS's adapter patterns (or build your own minimal adapter inspired by their architecture)
|
|
2. **GPT Researcher → Telegram:** Take your research agent from Week 4 and make it accessible via Telegram — users send a topic, agent researches and responds
|
|
3. **Multi-framework pipeline → Discord:** Take your Week 6 MetaGPT multi-agent setup and expose it through a Discord interface where users can kick off the SOP workflow
|
|
|
|
**Homework:**
|
|
- [ ] Deploy a previously-built agent to a real chat platform — it must respond to real messages
|
|
- [ ] Write a retrospective for ElizaOS:
|
|
- **Strengths:** What does it do better than building your own deployment layer?
|
|
- **Weaknesses:** Where is it limited or frustrating?
|
|
- **When to use:** What type of project benefits most from ElizaOS?
|
|
- **When to skip:** When is it overkill or the wrong tool?
|
|
- [ ] Update the comparison matrix with the ElizaOS column
|
|
- [ ] Answer: "If I were building a production agent for a client, would I use ElizaOS for deployment or roll my own? Why?"
|
|
|
|
### Key Questions You Should Be Able to Answer After Week 7
|
|
|
|
1. How does ElizaOS's character system differ from hardcoding agent personalities?
|
|
2. What's the plugin registration lifecycle — from `Plugin` definition to runtime availability?
|
|
3. How would you add a completely new platform (e.g., Slack, WhatsApp) to ElizaOS?
|
|
4. What are the trade-offs of a deployment-platform approach vs building bespoke platform integrations?
|
|
5. How does multi-platform memory work — and where does it break down?
|
|
6. When is ElizaOS the right choice vs a simple Discord.js bot?
|
|
7. What deployment patterns from ElizaOS would you steal for a custom agent pipeline?
|
|
|
|
---
|
|
|
|
## Week 8: Capstone Project
|
|
|
|
> **Timeline:** 1 week | **Difficulty:** ⭐⭐⭐⭐⭐ | **Goal:** Synthesize learnings from 3+ frameworks
|
|
|
|
### The Project: "Research → Analyze → Act" Pipeline
|
|
|
|
Build a system that combines at least 3 of the frameworks you studied:
|
|
|
|
#### Recommended Architecture
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────┐
|
|
│ Capstone Pipeline │
|
|
│ │
|
|
│ ┌──────────────┐ ┌──────────────┐ ┌────────────┐ │
|
|
│ │ GPT │ │ Pydantic-AI │ │ MetaGPT OR │ │
|
|
│ │ Researcher │───▶│ Structured │───▶│ MS Agent │ │
|
|
│ │ (Research) │ │ Analysis │ │ Framework │ │
|
|
│ │ │ │ Agent │ │ (Execute) │ │
|
|
│ └──────────────┘ └──────────────┘ └────────────┘ │
|
|
│ │
|
|
│ Optional additions: │
|
|
│ - Agent-S for browser automation during research │
|
|
│ - Yao for scheduling periodic re-research │
|
|
└─────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
#### Requirements
|
|
|
|
- [ ] **Stage 1: Research** — Use GPT Researcher to conduct deep research on a topic
|
|
- [ ] **Stage 2: Analysis** — Use Pydantic-AI to process research into structured data with validated output types
|
|
- [ ] **Stage 3: Action** — Use MetaGPT's multi-agent SOP OR MS Agent Framework's graph workflow to generate deliverables from the structured analysis
|
|
- [ ] **Integration:** The output of one stage must be the input to the next
|
|
- [ ] **Documentation:** Write a README explaining your architecture and design decisions
|
|
|
|
#### Stretch Goals
|
|
|
|
- [ ] Add a Yao scheduled trigger so the pipeline runs daily/weekly
|
|
- [ ] Deploy the entire pipeline to Discord/Telegram using ElizaOS patterns from Week 7
|
|
- [ ] Add observability (Logfire or OpenTelemetry)
|
|
- [ ] Add a web UI (even simple HTML)
|
|
- [ ] Use MCP to connect components
|
|
- [ ] Add Agent-S for any browser automation steps
|
|
|
|
#### Deliverables
|
|
|
|
- [ ] Working code at `~/agent-study/capstone/`
|
|
- [ ] `README.md` with architecture diagram and setup instructions
|
|
- [ ] `DECISIONS.md` explaining why you chose each framework for each stage
|
|
- [ ] `RETROSPECTIVE.md` — final thoughts on the 7-week journey
|
|
|
|
#### Suggested Topics for the Pipeline
|
|
|
|
1. **Competitor Analysis Tool** — Research competitors → Structure findings → Generate strategic recommendations
|
|
2. **Daily News Briefing** — Research trending topics → Analyze relevance → Generate personalized newsletter
|
|
3. **Technical Due Diligence** — Research a technology → Structured pros/cons → Multi-perspective report (architect, PM, engineer roles)
|
|
4. **Market Research Report** — Research a market → Structured data extraction → Executive summary + detailed report
|
|
|
|
---
|
|
|
|
## Appendix: Comparison Matrix Template
|
|
|
|
Save this at `~/agent-study/comparison-matrix/matrix.md` and fill it in weekly:
|
|
|
|
```markdown
|
|
# AI Agent Framework Comparison Matrix
|
|
|
|
| Dimension | Pydantic-AI | MS Agent Framework | Agent-S | GPT Researcher | Yao | MetaGPT | ElizaOS |
|
|
|-----------|-------------|-------------------|---------|----------------|-----|---------|---------|
|
|
| **Language** | Python | Python + .NET | Python | Python | Go | Python | TypeScript |
|
|
| **Stars** | 14.6k | 7k | 9.6k | 25k | 7.5k | 63k | 17k |
|
|
| **Agent Definition** | | | | | | | |
|
|
| **Tool Integration** | | | | | | | |
|
|
| **Multi-Agent Coord.** | | | | | | | |
|
|
| **Error Handling** | | | | | | | |
|
|
| **Observability** | | | | | | | |
|
|
| **Type Safety** | | | | | | | |
|
|
| **DX / Ergonomics** | | | | | | | |
|
|
| **Production Readiness** | | | | | | | |
|
|
| **Unique Superpower** | | | | | | | |
|
|
| **Biggest Weakness** | | | | | | | |
|
|
| **Best Use Case** | | | | | | | |
|
|
| **Would I Use For...** | | | | | | | |
|
|
| **Overall Rating (1-10)** | | | | | | | |
|
|
```
|
|
|
|
---
|
|
|
|
## 📊 Week-by-Week Schedule Overview
|
|
|
|
| Week | Framework | Focus | Difficulty | Key Deliverables |
|
|
|------|-----------|-------|------------|------------------|
|
|
| 0 | Prep | Setup & background reading | ⭐ | Environment ready, papers skimmed |
|
|
| 1 | Pydantic-AI | Type-safe agents, DI, structured output | ⭐⭐ | Architecture doc, 3 agents, steal doc |
|
|
| 2 | MS Agent Framework | Graph workflows, DevUI, enterprise patterns | ⭐⭐⭐ | Graph workflow, DevUI screenshots, steal doc |
|
|
| 3 | Agent-S | Computer use, visual grounding, screenshots | ⭐⭐⭐⭐ | Computer use demo, architecture analysis |
|
|
| 4 | GPT Researcher | Deep research, Plan-and-Solve, RAG | ⭐⭐ | Research agent, MCP integration |
|
|
| 5 | Yao | Event-driven agents, Go, single binary, GraphRAG | ⭐⭐⭐⭐ | Event-driven agent, DSL exploration |
|
|
| 6 | MetaGPT | SOPs, multi-agent teams, roles/actions | ⭐⭐⭐ | Multi-agent SOP, comparison matrix |
|
|
| 7 | ElizaOS | Deployment, multi-platform distribution, plugins | ⭐⭐ | Multi-platform agent, custom plugin, deploy a Week 1-6 agent |
|
|
| 8 | Capstone | Integrate 3+ frameworks | ⭐⭐⭐⭐⭐ | Working pipeline, docs, retrospective |
|
|
|
|
---
|
|
|
|
## 🏁 Success Criteria
|
|
|
|
After completing this study plan, you should be able to:
|
|
|
|
1. **Explain** the architecture of each framework from memory (whiteboard test)
|
|
2. **Build** a production-grade agent with Pydantic-AI from scratch
|
|
3. **Design** a graph workflow for a complex multi-step process
|
|
4. **Understand** computer-use agent architecture and its limitations
|
|
5. **Implement** a Plan-and-Solve research pipeline
|
|
6. **Compare** event-driven vs request-response agent architectures
|
|
7. **Deploy** an agent to Discord/Telegram and understand multi-platform routing patterns
|
|
8. **Choose** the right framework for a given problem with clear reasoning
|
|
9. **Read** any agent framework's source code and quickly identify its core abstractions
|
|
|
|
> *"The goal isn't to memorize APIs. It's to build intuition for how agent systems are designed, so you can build your own or extend existing ones with confidence."*
|
|
|
|
---
|
|
|
|
*Generated by Clawdbot | February 4, 2026*
|