reddit trend analyzer === a tool for discovering common problems and questions in reddit communities to inform content strategy and tool development. core goal --- find what people struggle with most -> create content/tools that solve those problems -> organic SEO growth tech stack --- - vector database: qdrant - embeddings: nomic-embed-text (ollama) - framework: next.js - components: shadcn - charts: recharts (simple, shadcn-compatible) - theme: shadcn tokens from globals.css inline theme ONLY data pipeline --- ``` reddit scrape -> text cleaning -> embedding -> qdrant storage | clustering (HDBSCAN) | problem extraction (LLM) | frequency + engagement scoring ``` core features --- **1. data ingestion** existing CLI handles this well: - scrape subreddit posts + comments - embed with nomic-embed-text - store in qdrant with metadata (score, created, subreddit, type) **2. problem clustering** the key feature. group similar discussions to surface recurring themes. - cluster embeddings using HDBSCAN (density-based, handles noise well) - extract cluster centroids as topic anchors - LLM pass to generate human-readable problem statements from each cluster - rank clusters by: - size (discussion count) - total engagement (sum of upvotes) - recency (still being talked about?) output example: ``` | problem | discussions | upvotes | last seen | |----------------------------------------------|-------------|---------|-----------| | users struggle with X when doing Y | 47 | 2.3k | 2d ago | | confusion about how to configure Z | 31 | 890 | 1w ago | | no good free alternative to [competitor] | 28 | 1.1k | 3d ago | ``` **3. question extraction** pull out actual questions people ask. - pattern matching: "how do I", "why does", "is there a way to", "what's the best", etc. - deduplicate semantically similar questions (vector similarity > 0.9) - rank by engagement - group under parent problem clusters output: FAQ-ready list for blog posts, docs, or schema markup **4. search + explore** - semantic search across all scraped content - filter by: subreddit, date range, min upvotes, type (post/comment) - click through to original reddit discussions **5. export** - problem clusters as markdown content briefs - questions as FAQ schema (json-ld ready) - csv for spreadsheet analysis - raw json for custom processing dashboard views --- **home / stats** simple overview: - total posts/comments in db - subreddits being tracked - problem clusters identified - recent scrape activity **problem explorer** (main view) sortable/filterable table of problem clusters: - columns: problem summary, discussion count, total upvotes, avg sentiment, last active - expand row -> sample discussions + extracted questions - select multiple -> bulk export as content briefs - search within problems **question bank** all extracted questions: - grouped by parent problem cluster (collapsible) - search/filter - copy as json-ld FAQ schema - mark as "addressed" when content exists **scrape manager** - list of tracked subreddits - manual scrape trigger - scrape history with stats - add/remove subreddits To give the user "Ultimate Control," the dashboard should include: 1. **Similarity Sensitivity Slider:** A global control that adjusts how strict the vector database is. Lower similarity = more broad, creative connections. Higher similarity = more specific, literal results. 2. **The "Impact Score" Weighting:** Allow users to toggle what "Importance" means to them. Is it **Upvote Count**? **Sentiment Extremity**? Or **Topic Velocity**? Adjusting these weights should re-order the "Competitor Hijack" table in real-time. 3. **Command Palette:** Instead of clicking through menus, a "Ctrl + K" command bar allows the user to type "Find gaps in comparison intent" to instantly update the visualizations. implementation phases --- **phase 1: clustering + extraction (backend)** - [ ] add HDBSCAN clustering to pipeline - [ ] LLM integration for problem summarization (claude or local) - [ ] question extraction with pattern matching + dedup - [ ] store clusters in qdrant (or sqlite sidecar) - [ ] CLI commands: `cluster`, `problems`, `questions` **phase 2: web UI** - [ ] next.js app with shadcn - [ ] problem explorer table (tanstack table) - [ ] question bank view - [ ] semantic search - [ ] export functionality - [ ] basic stats dashboard **phase 3: polish** - [ ] scheduled/recurring scrapes - [ ] better semantic deduplication - [ ] sentiment scoring (optional) - [ ] "addressed" tracking (link to published content) env vars --- ``` QDRANT_URL=https://vectors.biohazardvfx.com QDRANT_API_KEY= OLLAMA_HOST=http://localhost:11434 ANTHROPIC_API_KEY= # for problem summarization ``` success criteria --- tool is working if: - we can identify 10+ distinct problems from a subreddit scrape - problem summaries are actionable (could write a blog post about it) - question extraction gives us real FAQs people are asking - export format is immediately usable for content planning everything else is nice-to-have. --- theme (globals.css) --- ```css :root { --background: oklch(0.9551 0 0); --foreground: oklch(0.3211 0 0); --card: oklch(0.9702 0 0); --card-foreground: oklch(0.3211 0 0); --popover: oklch(0.9702 0 0); --popover-foreground: oklch(0.3211 0 0); --primary: oklch(0.4891 0 0); --primary-foreground: oklch(1.0000 0 0); --secondary: oklch(0.9067 0 0); --secondary-foreground: oklch(0.3211 0 0); --muted: oklch(0.8853 0 0); --muted-foreground: oklch(0.5103 0 0); --accent: oklch(0.8078 0 0); --accent-foreground: oklch(0.3211 0 0); --destructive: oklch(0.5594 0.1900 25.8625); --destructive-foreground: oklch(1.0000 0 0); --border: oklch(0.8576 0 0); --input: oklch(0.9067 0 0); --ring: oklch(0.4891 0 0); --chart-1: oklch(0.4891 0 0); --chart-2: oklch(0.4863 0.0361 196.0278); --chart-3: oklch(0.6534 0 0); --chart-4: oklch(0.7316 0 0); --chart-5: oklch(0.8078 0 0); --sidebar: oklch(0.9370 0 0); --sidebar-foreground: oklch(0.3211 0 0); --sidebar-primary: oklch(0.4891 0 0); --sidebar-primary-foreground: oklch(1.0000 0 0); --sidebar-accent: oklch(0.8078 0 0); --sidebar-accent-foreground: oklch(0.3211 0 0); --sidebar-border: oklch(0.8576 0 0); --sidebar-ring: oklch(0.4891 0 0); --font-sans: Montserrat, sans-serif; --font-serif: Georgia, serif; --font-mono: Fira Code, monospace; --radius: 0.35rem; --shadow-x: 0px; --shadow-y: 2px; --shadow-blur: 0px; --shadow-spread: 0px; --shadow-opacity: 0.15; --shadow-color: hsl(0 0% 20% / 0.1); --shadow-2xs: 0px 2px 0px 0px hsl(0 0% 20% / 0.07); --shadow-xs: 0px 2px 0px 0px hsl(0 0% 20% / 0.07); --shadow-sm: 0px 2px 0px 0px hsl(0 0% 20% / 0.15), 0px 1px 2px -1px hsl(0 0% 20% / 0.15); --shadow: 0px 2px 0px 0px hsl(0 0% 20% / 0.15), 0px 1px 2px -1px hsl(0 0% 20% / 0.15); --shadow-md: 0px 2px 0px 0px hsl(0 0% 20% / 0.15), 0px 2px 4px -1px hsl(0 0% 20% / 0.15); --shadow-lg: 0px 2px 0px 0px hsl(0 0% 20% / 0.15), 0px 4px 6px -1px hsl(0 0% 20% / 0.15); --shadow-xl: 0px 2px 0px 0px hsl(0 0% 20% / 0.15), 0px 8px 10px -1px hsl(0 0% 20% / 0.15); --shadow-2xl: 0px 2px 0px 0px hsl(0 0% 20% / 0.38); --tracking-normal: 0em; --spacing: 0.25rem; } .dark { --background: oklch(0.2178 0 0); --foreground: oklch(0.8853 0 0); --card: oklch(0.2435 0 0); --card-foreground: oklch(0.8853 0 0); --popover: oklch(0.2435 0 0); --popover-foreground: oklch(0.8853 0 0); --primary: oklch(0.7058 0 0); --primary-foreground: oklch(0.2178 0 0); --secondary: oklch(0.3092 0 0); --secondary-foreground: oklch(0.8853 0 0); --muted: oklch(0.2850 0 0); --muted-foreground: oklch(0.5999 0 0); --accent: oklch(0.3715 0 0); --accent-foreground: oklch(0.8853 0 0); --destructive: oklch(0.6591 0.1530 22.1703); --destructive-foreground: oklch(1.0000 0 0); --border: oklch(0.3290 0 0); --input: oklch(0.3092 0 0); --ring: oklch(0.7058 0 0); --chart-1: oklch(0.7058 0 0); --chart-2: oklch(0.6714 0.0339 206.3482); --chart-3: oklch(0.5452 0 0); --chart-4: oklch(0.4604 0 0); --chart-5: oklch(0.3715 0 0); --sidebar: oklch(0.2393 0 0); --sidebar-foreground: oklch(0.8853 0 0); --sidebar-primary: oklch(0.7058 0 0); --sidebar-primary-foreground: oklch(0.2178 0 0); --sidebar-accent: oklch(0.3715 0 0); --sidebar-accent-foreground: oklch(0.8853 0 0); --sidebar-border: oklch(0.3290 0 0); --sidebar-ring: oklch(0.7058 0 0); --font-sans: Inter, sans-serif; --font-serif: Georgia, serif; --font-mono: Fira Code, monospace; --radius: 0.35rem; --shadow-x: 0px; --shadow-y: 2px; --shadow-blur: 0px; --shadow-spread: 0px; --shadow-opacity: 0.15; --shadow-color: hsl(0 0% 20% / 0.1); --shadow-2xs: 0px 2px 0px 0px hsl(0 0% 20% / 0.07); --shadow-xs: 0px 2px 0px 0px hsl(0 0% 20% / 0.07); --shadow-sm: 0px 2px 0px 0px hsl(0 0% 20% / 0.15), 0px 1px 2px -1px hsl(0 0% 20% / 0.15); --shadow: 0px 2px 0px 0px hsl(0 0% 20% / 0.15), 0px 1px 2px -1px hsl(0 0% 20% / 0.15); --shadow-md: 0px 2px 0px 0px hsl(0 0% 20% / 0.15), 0px 2px 4px -1px hsl(0 0% 20% / 0.15); --shadow-lg: 0px 2px 0px 0px hsl(0 0% 20% / 0.15), 0px 4px 6px -1px hsl(0 0% 20% / 0.15); --shadow-xl: 0px 2px 0px 0px hsl(0 0% 20% / 0.15), 0px 8px 10px -1px hsl(0 0% 20% / 0.15); --shadow-2xl: 0px 2px 0px 0px hsl(0 0% 20% / 0.38); } @theme inline { --color-background: var(--background); --color-foreground: var(--foreground); --color-card: var(--card); --color-card-foreground: var(--card-foreground); --color-popover: var(--popover); --color-popover-foreground: var(--popover-foreground); --color-primary: var(--primary); --color-primary-foreground: var(--primary-foreground); --color-secondary: var(--secondary); --color-secondary-foreground: var(--secondary-foreground); --color-muted: var(--muted); --color-muted-foreground: var(--muted-foreground); --color-accent: var(--accent); --color-accent-foreground: var(--accent-foreground); --color-destructive: var(--destructive); --color-destructive-foreground: var(--destructive-foreground); --color-border: var(--border); --color-input: var(--input); --color-ring: var(--ring); --color-chart-1: var(--chart-1); --color-chart-2: var(--chart-2); --color-chart-3: var(--chart-3); --color-chart-4: var(--chart-4); --color-chart-5: var(--chart-5); --color-sidebar: var(--sidebar); --color-sidebar-foreground: var(--sidebar-foreground); --color-sidebar-primary: var(--sidebar-primary); --color-sidebar-primary-foreground: var(--sidebar-primary-foreground); --color-sidebar-accent: var(--sidebar-accent); --color-sidebar-accent-foreground: var(--sidebar-accent-foreground); --color-sidebar-border: var(--sidebar-border); --color-sidebar-ring: var(--sidebar-ring); --font-sans: var(--font-sans); --font-mono: var(--font-mono); --font-serif: var(--font-serif); --radius-sm: calc(var(--radius) - 4px); --radius-md: calc(var(--radius) - 2px); --radius-lg: var(--radius); --radius-xl: calc(var(--radius) + 4px); --shadow-2xs: var(--shadow-2xs); --shadow-xs: var(--shadow-xs); --shadow-sm: var(--shadow-sm); --shadow: var(--shadow); --shadow-md: var(--shadow-md); --shadow-lg: var(--shadow-lg); --shadow-xl: var(--shadow-xl); --shadow-2xl: var(--shadow-2xl); } ```