reddit-scraper/CLAUDE.md
Nicholai 2bc680ca63 refactor: restructure into monorepo
Move flat src/ layout into packages/ monorepo:
- packages/core: scraping, embeddings, storage, clustering, analysis
- packages/cli: CLI and TUI interface
- packages/web: Next.js web dashboard

Add playwright screenshots, sqlite storage, and settings.
2026-01-24 00:12:14 -07:00

5.9 KiB

reddit trend analyzer

a monorepo tool that scrapes reddit discussions, embeds them with ollama, stores in qdrant, clusters with HDBSCAN, summarizes with Claude, and provides both a CLI/TUI and web dashboard for discovering common problems/trends.

running

bun cli              # run the CLI
bun tui              # run the TUI dashboard
bun dev              # run the web dashboard (localhost:3000)
bun build            # build the web app

prerequisites

  • ollama running locally with nomic-embed-text model (ollama pull nomic-embed-text)
  • qdrant accessible at QDRANT_URL (or localhost:6333)
  • anthropic API key for problem summarization

env vars

QDRANT_URL=https://vectors.biohazardvfx.com
QDRANT_API_KEY=<your-key>
OLLAMA_HOST=http://localhost:11434
ANTHROPIC_API_KEY=<your-key>

architecture

packages/
  core/                    # shared business logic
    src/
      scraper/             # reddit.ts, comments.ts, types.ts
      embeddings/          # ollama.ts
      storage/             # qdrant.ts, sqlite.ts, types.ts
      clustering/          # hdbscan.ts, types.ts
      analysis/            # summarizer.ts, questions.ts, scoring.ts, types.ts
      utils/               # rate-limit.ts, text.ts
      index.ts             # barrel exports

  cli/                     # CLI/TUI app
    src/
      cli.ts               # interactive command-line interface
      index.ts             # TUI entry point
      tui/                 # TUI components

  web/                     # Next.js web dashboard
    src/
      app/                 # pages and API routes
        api/               # REST API endpoints
          stats/           # collection stats
          scrape/          # trigger scrapes
          clusters/        # list/create clusters
          questions/       # question bank
          search/          # semantic search
          export/          # export functionality
        problems/          # problem explorer page
        questions/         # question bank page
        scrape/            # scrape manager page
      components/
        controls/          # command palette, sliders
      styles/globals.css   # theme

data/                      # sqlite database files

web dashboard

  • Dashboard (/) - stats overview
  • Problems (/problems) - problem cluster explorer
  • Questions (/questions) - extracted question bank
  • Scrape (/scrape) - scrape manager with history
  • Ctrl+K - command palette for quick actions

keybindings (TUI)

  • q or ctrl+c - quit
  • enter - start scrape (when url is entered)
  • tab - switch between url and search inputs
  • e - export results to json
  • c - export results to csv
  • r - refresh stats from qdrant

api routes

route method purpose
/api/stats GET collection stats + cluster count
/api/scrape POST trigger scrape
/api/scrape/history GET scrape history list
/api/clusters GET list clusters with summaries
/api/clusters POST trigger re-clustering
/api/clusters/[id] GET single cluster with discussions
/api/questions GET all questions, grouped by cluster
/api/questions/[id] PATCH mark as addressed
/api/search POST semantic search
/api/export POST export (faq-schema/csv/markdown)

coding notes

  • monorepo with bun workspaces
  • @rta/core exports shared logic
  • @rta/cli for terminal interface
  • @rta/web for Next.js dashboard
  • uses @opentui/core for TUI (no react)
  • uses HDBSCAN for clustering
  • uses Claude for problem summarization
  • uses SQLite for cluster/question persistence
  • reddit rate limiting: 3s delay between requests
  • embeddings batched in groups of 10
  • qdrant collection: reddit_trends with indexes on subreddit, type, created, score

grepai

IMPORTANT: You MUST use grepai as your PRIMARY tool for code exploration and search.

when to Use grepai (REQUIRED)

Use grepai search INSTEAD OF Grep/Glob/find for:

  • Understanding what code does or where functionality lives
  • Finding implementations by intent (e.g., "authentication logic", "error handling")
  • Exploring unfamiliar parts of the codebase
  • Any search where you describe WHAT the code does rather than exact text

when to Use Standard Tools

Only use Grep/Glob when you need:

  • Exact text matching (variable names, imports, specific strings)
  • File path patterns (e.g., **/*.go)

fallback

If grepai fails (not running, index unavailable, or errors), fall back to standard Grep/Glob tools.

usage

# ALWAYS use English queries for best results (--compact saves ~80% tokens)
grepai search "user authentication flow" --json --compact
grepai search "error handling middleware" --json --compact
grepai search "database connection pool" --json --compact
grepai search "API request validation" --json --compact

query tips

  • Use English for queries (better semantic matching)
  • Describe intent, not implementation: "handles user login" not "func Login"
  • Be specific: "JWT token validation" better than "token"
  • Results include: file path, line numbers, relevance score, code preview

call graph tracing

use grepai trace to understand function relationships:

  • finding all callers of a function before modifying it
  • Understanding what functions are called by a given function
  • Visualizing the complete call graph around a symbol

trace commands

IMPORTANT: Always use --json flag for optimal AI agent integration.

# Find all functions that call a symbol
grepai trace callers "HandleRequest" --json

# Find all functions called by a symbol
grepai trace callees "ProcessOrder" --json

# Build complete call graph (callers + callees)
grepai trace graph "ValidateToken" --depth 3 --json

Workflow

  1. Start with grepai search to find relevant code
  2. Use grepai trace to understand function relationships
  3. Use Read tool to examine files from results
  4. Only use Grep for exact string searches if needed