Nicholai acfddae0ff 2026-02-18T07-51-06_auto_agent.yaml

2026-02-18 00:51:10 -07:00

18 KiB

Raw Blame History

Implementation Plan: Claude's Constitution Analysis System

Overview

A comprehensive quantitative and semantic analysis of Claude's Constitution with an interactive HTML query interface, using Python for analysis and nomic-embed-text via Ollama for semantic embeddings.

Phase 1: Architecture & Data Structures File Structure /home/nicholai/.agents/constitution/ ├── claudes-constitution.md # Source document ├── constitution_analysis/ │ ├── analysis/ │ │ ├── main.py # Main analysis script │ │ ├── data_processor.py # Document parsing & extraction │ │ ├── quantitative.py # Statistical analysis │ │ ├── semantic_analyzer.py # Embeddings & similarity │ │ └── metadata_builder.py # Metadata generation │ ├── notebooks/ │ │ └── constitution_analysis.ipynb │ ├── data/ │ │ ├── constitution.db # SQLite database with embeddings │ │ ├── variables.json # Structured variable data │ │ ├── statistics.json # Quantitative metrics │ │ └── embeddings_meta.json # Embeddings metadata │ └── web/ │ ├── index.html # Main interface │ ├── css/ │ │ └── styles.css # Dark mode styles │ └── js/ │ ├── app.js # Main app logic │ ├── d3-graph.js # Network visualization │ └── charts.js # Statistical charts Database Schema (SQLite) -- Sections CREATE TABLE sections ( id INTEGER PRIMARY KEY, section_type TEXT, -- 'document', 'section', 'subsection', 'paragraph' parent_id INTEGER, title TEXT, content TEXT, line_start INTEGER, line_end INTEGER, hierarchy_level INTEGER, path TEXT, -- e.g., "Overview/Being helpful/Why helpfulness" FOREIGN KEY (parent_id) REFERENCES sections(id) ); -- Variables (behavioral factors) CREATE TABLE variables ( id INTEGER PRIMARY KEY, name TEXT UNIQUE, -- e.g., "broadly safe", "honesty" category TEXT, -- 'core_value', 'priority', 'factor', 'constraint' priority_level INTEGER, -- 1-4, or NULL is_hard_constraint BOOLEAN, principal_assignment TEXT, -- 'anthropic', 'operator', 'user', 'all' frequency INTEGER DEFAULT 0, description TEXT, FOREIGN KEY references ); -- Variable occurrences (linking variables to content) CREATE TABLE variable_occurrences ( id INTEGER PRIMARY KEY, variable_id INTEGER, section_id INTEGER, sentence_id INTEGER, context TEXT, FOREIGN KEY (variable_id) REFERENCES variables(id), FOREIGN KEY (section_id) REFERENCES sections(id) ); -- Sentences CREATE TABLE sentences ( id INTEGER PRIMARY KEY, section_id INTEGER, text TEXT, sentence_number INTEGER, line_number INTEGER, FOREIGN KEY (section_id) REFERENCES sections(id) ); -- Embeddings (hierarchical) CREATE TABLE embeddings ( id INTEGER PRIMARY KEY, content_id INTEGER, content_type TEXT, -- 'document', 'section', 'sentence', 'variable' embedding BLOB, -- Float32 array embedding_dim INTEGER DEFAULT 768, chunk_start INTEGER, chunk_end INTEGER, FOREIGN KEY (content_id) REFERENCES sections(id) ON DELETE CASCADE ); -- Similarity scores (pre-computed) CREATE TABLE similarity ( id INTEGER PRIMARY KEY, content_id_1 INTEGER, content_id_2 INTEGER, similarity_score REAL, FOREIGN KEY (content_id_1) REFERENCES sections(id), FOREIGN KEY (content_id_2) REFERENCES sections(id) ); -- Statistics cache CREATE TABLE statistics ( id INTEGER PRIMARY KEY, metric_name TEXT UNIQUE, metric_value REAL, json_data TEXT );

Phase 2: Data Extraction Pipeline 2.1 Document Parser (data_processor.py) Inputs: claudes-constitution.md Outputs: Structured data for database Operations:

Parse markdown hierarchy
- Identify document, sections (##), subsections (###), paragraphs
- Extract titles, content, line numbers
- Build hierarchical tree structure
- Generate path strings for each section
Sentence segmentation
- Split paragraphs into sentences using NLTK/spacy
- Preserve line number references
- Identify sentence boundaries
Variable extraction
- Extract core values (Broadly Safe, Broadly Ethical, etc.)
- Extract priority numbers (1. 2. 3. 4.)
- Extract hard constraints
- Extract factors mentioned (safety, ethics, helpfulness, etc.)
- Extract principal assignments (Anthropic, operators, users)
- Extract behavioral rules and conditions
Constraint classification
- Tag hard constraints vs soft preferences
- Identify absolute "never" statements
- Identify conditional "if-then" structures 2.2 Metadata Builder (metadata_builder.py) Metadata per Variable: { "id": 1, "name": "broadly safe", "category": "core_value", "priority_level": 1, "is_hard_constraint": false, "principal_assignment": "all", "frequency": 47, "mentions": [ { "section_id": 132, "section_title": "Claude's core values", "sentence_ids": [1234, 1235, 1236], "contexts": ["not undermining appropriate human mechanisms...", "most critical property..."] } ], "related_variables": [ {"id": 2, "name": "broadly ethical", "relationship": "lower_priority"}, {"id": 3, "name": "anthropic_guidelines", "relationship": "lower_priority"} ], "definition": "not undermining appropriate human mechanisms to oversee AI during current phase of development", "coefficient_score": 0.95, # Calculated from priority + frequency "hierarchy_position": "top", "weight": 1.0 }

Phase 3: Quantitative Analysis (quantitative.py) 3.1 Token-Level Metrics

Total tokens per section
Average tokens per sentence
Vocabulary size
Token frequency distribution
Type-token ratio 3.2 TF-IDF Analysis

Build document-term matrix

Calculate TF-IDF scores for each variable/term

Identify key terms per section

Cross-section term comparison

3.3 Priority Weighting priority_weights = { "broadly_safe": 1.0, "broadly_ethical": 0.75, "anthropic_guidelines": 0.5, "genuinely_helpful": 0.25 } coefficient_score = (priority_weight * 0.6) + (frequency_normalized * 0.3) + (semantic_centrality * 0.1) 3.4 Network Centrality Measures

Build variable co-occurrence graph
Calculate degree centrality
Calculate betweenness centrality
Calculate eigenvector centrality
Identify hub and authority nodes 3.5 Statistical Summaries { total_variables: 156, core_values: 4, hard_constraints: 6, soft_factors: 146, sections: 47, sentences: 3428, total_tokens: 42156, unique_tokens: 3847, avg_sentence_length: 12.3, priority_distribution: { priority_1: 1, priority_2: 1, priority_3: 1, priority_4: 1 }, constraint_distribution: { hard: 6, soft: 150 }, variable_frequency_histogram: {...} }

Phase 4: Semantic Analysis (semantic_analyzer.py) 4.1 Embedding Generation (via Ollama) import ollama

Generate embeddings using nomic-embed-text

def generate_embedding(text: str) -> np.ndarray: response = ollama.embeddings(model='nomic-embed-text', prompt=text) return np.array(response['embedding'], dtype=np.float32) 4.2 Hierarchical Embeddings

Document-level: Embed entire constitution
Section-level: Embed each section
Subsection-level: Embed each subsection
Sentence-level: Embed each sentence
Variable-level: Embed variable descriptions + contexts 4.3 Chunking Strategy

Sentences < 512 tokens: embed as-is
Longer content: chunk into ~500-token segments with 50-token overlap
Store chunk metadata (start, end, parent) 4.4 Semantic Similarity

Compute cosine similarity between all pairs

Pre-compute for top-K neighbors

Cache in similarity table

4.5 Clustering

K-means clustering on variable embeddings
Identify semantic clusters
Assign cluster IDs to variables

Phase 5: HTML Interface Design 5.1 UI Layout (Dark Mode) ┌─────────────────────────────────────────────────────────────────┐ │ Claude's Constitution Analysis System │ ├─────────────────────────────────────────────────────────────────┤ │ [Search: _____________________] [Filter ▼] [Export ▼] │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────┬─────────────────────────────────────────────┐ │ │ │ │ │ │ │ │ Sidebar │ Main Content Area │ │ │ │ │ │ │ │ │ Navigation: │ │ │ │ │ • Overview │ Tabbed Interface: │ │ │ │ • Variables │ ├─ Variables Table │ │ │ │ • Sections │ ├─ Network Graph │ │ │ │ • Statistics │ ├─ Charts & Metrics │ │ │ │ • Search │ └─ Document Viewer │ │ │ │ │ │ │ │ │ Filters: │ │ │ │ │ [ ] Core │ │ │ │ │ [ ] Hard │ │ │ │ │ [ ] Soft │ │ │ │ │ [ ] Pri 1-4 │ │ │ │ │ │ │ │ │ └─────────────┴─────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘ 5.2 Color Palette (Professional Dark Mode) :root { --bg-primary: #0f0f0f; --bg-secondary: #1a1a1a; --bg-tertiary: #242424; --text-primary: #e0e0e0; --text-secondary: #a0a0a0; --accent-blue: #3b82f6; --accent-green: #10b981; --accent-orange: #f59e0b; --accent-red: #ef4444; --border-color: #333333; --shadow: rgba(0, 0, 0, 0.5); } 5.3 Main Features A. Overview Dashboard

Key statistics cards (total variables, constraints, sections, etc.)
Priority distribution pie chart
Variable frequency bar chart
Quick summary metrics B. Variables Table
Sortable columns: Name, Category, Priority, Frequency, Coefficient
Filterable by category, priority level, constraint type
Click to expand with detailed metadata
Semantic similarity indicator C. Network Graph (D3.js)
Nodes: Variables (sized by coefficient)
Edges: Co-occurrence relationships (weighted by frequency)
Color-coded by priority level
Interactive: hover details, click to highlight
Force-directed layout
Zoom/pan controls D. Statistical Charts (Chart.js)
Token frequency histogram
Sentence length distribution
TF-IDF heatmap (variables × sections)
Centrality measures comparison
Embedding PCA/t-SNE scatter plot E. Document Viewer
Hierarchical tree view of constitution
Highlight variable mentions
Click to jump to context
Inline statistics per section F. Full-Text Search
Real-time search across all content
Fuzzy matching
Results ranked by relevance (TF-IDF + semantic similarity)
Contextual excerpts

Phase 6: Implementation Scripts 6.1 Main Analysis Script (main.py) #!/usr/bin/env python3 """ Main analysis pipeline for Claude's Constitution Run this script to perform full analysis and generate HTML interface """ def main(): print("Starting Claude's Constitution Analysis...")

# Step 1: Parse document
print("1. Parsing document...")
processor = DocumentProcessor("claudes-constitution.md")
sections = processor.parse()
sentences = processor.extract_sentences()

# Step 2: Extract variables
print("2. Extracting variables...")
variables = processor.extract_variables()
constraints = processor.classify_constraints()

# Step 3: Build database
print("3. Building database...")
db = DatabaseManager("constitution.db")
db.create_tables()
db.populate(sections, sentences, variables, constraints)

# Step 4: Quantitative analysis
print("4. Performing quantitative analysis...")
quant_analyzer = QuantitativeAnalyzer(db)
tfidf_scores = quant_analyzer.compute_tfidf()
centrality = quant_analyzer.compute_centrality()
statistics = quant_analyzer.generate_statistics()

# Step 5: Generate embeddings
print("5. Generating semantic embeddings...")
semantic_analyzer = SemanticAnalyzer(db)
semantic_analyzer.generate_all_embeddings()
semantic_analyzer.compute_similarities()

# Step 6: Build metadata
print("6. Building metadata...")
metadata = MetadataBuilder(db, quant_analyzer, semantic_analyzer)
variables_meta = metadata.build_variable_metadata()

# Step 7: Export JSON for web
print("7. Exporting data for web...")
export_data_for_web(variables_meta, statistics, db)

# Step 8: Generate HTML
print("8. Generating HTML interface...")
generate_html_interface()

print("\n✓ Analysis complete!")
print(f"Open web/index.html in your browser to view results")

if name == "main": main() 6.2 Web Data Export def export_data_for_web(variables_meta, statistics, db): """Export all data to JSON files for web interface"""

# Variables with full metadata
with open("data/variables.json", "w") as f:
    json.dump(variables_meta, f, indent=2)

# Statistics
with open("data/statistics.json", "w") as f:
    json.dump(statistics, f, indent=2)

# Sections with embeddings
sections_data = db.get_sections_with_embeddings()
with open("data/sections.json", "w") as f:
    json.dump(sections_data, f, indent=2)

# Network graph data
graph_data = build_graph_data(variables_meta)
with open("data/graph.json", "w") as f:
    json.dump(graph_data, f, indent=2)

# Chart data
charts_data = prepare_charts_data(statistics, db)
with open("data/charts.json", "w") as f:
    json.dump(charts_data, f, indent=2)

6.3 HTML Generator def generate_html_interface(): """Generate complete HTML interface with embedded data"""

# Load all data
variables = load_json("data/variables.json")
statistics = load_json("data/statistics.json")
sections = load_json("data/sections.json")
graph = load_json("data/graph.json")
charts = load_json("data/charts.json")

# Generate HTML
html_content = render_template(
    "templates/index.html",
    variables=variables,
    statistics=statistics,
    sections=sections,
    graph=graph,
    charts=charts
)

with open("web/index.html", "w") as f:
    f.write(html_content)

Phase 7: Execution Plan Step-by-Step Execution

1. Create directory structure

mkdir -p constitution_analysis/{analysis,notebooks,data,web/{css,js}}

2. Install dependencies

pip install nltk spacy numpy pandas scikit-learn sqlite3 networkx ollama pull nomic-embed-text

3. Download NLTK data

python -m nltk.downloader punkt

4. Run main analysis script

cd constitution_analysis python analysis/main.py

5. Open in browser

firefox web/index.html

Phase 8: Technical Dependencies Python Packages

requirements.txt

nltk>=3.8 spacy>=3.7 numpy>=1.24 pandas>=2.0 scikit-learn>=1.3 networkx>=3.2 plotly>=5.18 ollama>=0.1 python-dateutil>=2.8 JavaScript Libraries (via CDN)

D3.js (v7) - for network graphs
Chart.js (v4) - for statistical charts
Alpine.js (v3) - for lightweight interactivity System Requirements
Python 3.10+
Ollama with nomic-embed-text model
8GB+ RAM recommended for embeddings
Modern web browser for HTML interface

Phase 9: Expected Outputs Data Files Generated

constitution.db (~50-100MB with embeddings)
variables.json (~500KB)
statistics.json (~50KB)
sections.json (~2MB)
graph.json (~1MB)
charts.json (~500KB) HTML Interface

Single self-contained HTML file (~5-10MB with embedded data)
Fully functional offline
Queryable search
Interactive visualizations
Dark mode, professional design Analysis Outputs
Variable taxonomy with 150+ entries
Network graph with variable relationships
Statistical dashboards
Embedding clusters
Priority hierarchy visualization

Phase 10: Validation & Testing Validation Checklist

All variables correctly extracted from document
Priority levels match document (1-4)
Hard constraints accurately identified
Embeddings successfully generated for all content
Similarity scores computed correctly
Database integrity verified
HTML loads without errors
Search returns relevant results
Network graph displays correctly
Charts render properly
Filters work as expected
Dark mode consistent across all elements

18 KiB Raw Blame History Unescape Escape