beeble-forensic-analysis/docs/COMFYUI_GUIDE.md

# Replicating Beeble's Pipeline with Open-Source Tools

Most Beeble Studio users pay for PBR extractions--alpha mattes, depth
maps, normal maps--rather than the relighting. The extraction pipeline
is built from open-source models, and the PBR decomposition and
relighting stages now have viable open-source alternatives too.

This guide documents how to replicate each stage using ComfyUI and
direct Python. No workflow JSON files are provided yet, but the
relevant nodes, models, and tradeoffs are documented below.

If you are unfamiliar with ComfyUI, see https://docs.comfy.org/


## Pipeline overview

```
Input frame
    |
    +--> Background removal   -->  Alpha matte
    |    (InSPyReNet / BiRefNet)
    |
    +--> Depth estimation     -->  Depth map
    |    (Depth Anything V2)
    |
    +--> Normal estimation    -->  Normal map
    |    (StableNormal / NormalCrafter)
    |
    +--> PBR decomposition    -->  Albedo, Roughness, Metallic
    |    (CHORD / RGB↔X)
    |
    +--> Relighting           -->  Relit output
         (IC-Light / manual in Blender/Nuke)
```

The first two stages use the exact same models Beeble uses. The
remaining stages use different models that produce comparable outputs.


## 1. Background removal (Alpha matte)

**What Beeble uses**: transparent-background / InSPyReNet (MIT)

This is the simplest stage. Several ComfyUI nodes wrap the same
underlying model:

- **ComfyUI-InSPyReNet** -- wraps the same `transparent-background`
  library Beeble uses. Install via ComfyUI Manager.
- **ComfyUI-BiRefNet** -- uses BiRefNet, a newer model that often
  produces sharper edges around hair and fine detail.
- **ComfyUI-RMBG** -- BRIA's background removal model, another
  strong alternative.

For video, connect an image sequence loader to the removal node and
export the alpha channel as a separate pass. These models process
per-frame, so there is no temporal consistency--but alpha mattes are
typically stable enough that this is not a problem.

**Direct Python**:
```bash
pip install transparent-background
```
```python
from transparent_background import Remover
remover = Remover()
alpha = remover.process(image, type='map')
```


## 2. Depth estimation

**What Beeble uses**: Depth Anything V2 via Kornia (Apache 2.0)

- **ComfyUI-DepthAnythingV2** -- dedicated nodes for all model sizes.
- **comfyui_controlnet_aux** -- includes Depth Anything V2 as a
  preprocessor option.

Use the `large` variant for best quality. This is a per-frame model
with no temporal information, but monocular depth tends to be stable
across frames for most footage.

**Direct Python**:
```bash
pip install kornia
```
```python
from kornia.contrib import DepthAnything
model = DepthAnything.from_pretrained("depth-anything-v2-large")
depth = model(image_tensor)
```


## 3. Normal estimation

**What Beeble claims**: The CVPR 2024 paper describes a dedicated
"Normal Net" within SwitchLight. However, analysis of the deployed
application found no evidence of this specific architecture--the
PBR models appear to use standard encoder-decoder segmentation
frameworks with pretrained backbones (see [REPORT.md](REPORT.md)
section 4 for details).

Multiple open-source models now produce high-quality surface normals
from single images, and one handles video with temporal consistency.

### For single images

- **StableNormal** (SIGGRAPH Asia 2024) -- currently best benchmarks
  for monocular normal estimation. Uses a two-stage coarse-to-fine
  strategy with DINOv2 semantic features for guidance. A turbo variant
  runs 10x faster with minimal quality loss.
  GitHub: https://github.com/Stable-X/StableNormal

- **DSINE** (CVPR 2024) -- discriminative CNN-based approach. No
  diffusion overhead, so it is fast. Competitive with StableNormal on
  NYUv2 benchmarks. Good choice when inference speed matters.
  GitHub: https://github.com/markkua/DSINE

- **GeoWizard** (ECCV 2024) -- jointly predicts depth AND normals
  from a single image, which guarantees geometric consistency between
  the two. Available in ComfyUI via ComfyUI-Geowizard.
  GitHub: https://github.com/fuxiao0719/GeoWizard

### For video (temporally consistent normals)

- **NormalCrafter** (2025) -- this is the most relevant model for
  replicating Beeble's video pipeline. It uses video diffusion priors
  to produce temporally consistent normal maps across frames,
  directly comparable to SwitchLight 3.0's "true video model" claim.
  Has ComfyUI nodes via ComfyUI-NormalCrafterWrapper.
  GitHub: https://github.com/AIWarper/ComfyUI-NormalCrafterWrapper
  Paper: https://arxiv.org/abs/2504.11427

  Key parameters for ComfyUI:
  - `window_size`: number of frames processed together (default 14).
    Larger = better temporal consistency, more VRAM.
  - `time_step_size`: how far the window slides. Set smaller than
    window_size for overlapping windows and smoother transitions.

**Assessment**: For static images, StableNormal likely matches or
exceeds Beeble's normal quality, since it is a specialized model
rather than one sub-network within a larger system. For video,
NormalCrafter addresses the temporal consistency problem that was
previously a key differentiator of Beeble's pipeline.


## 4. PBR material decomposition (Albedo, Roughness, Metallic)

**What Beeble claims**: The CVPR 2024 paper describes a "Specular Net"
and analytical albedo derivation using a Cook-Torrance reflectance
model. Analysis of the deployed application found no Cook-Torrance,
BRDF, or physics-based rendering terminology in the binary. The PBR
models appear to use standard segmentation architectures
(segmentation_models_pytorch with pretrained backbones) trained on
proprietary portrait data. See [REPORT.md](REPORT.md) section 4.

Regardless of how Beeble implements PBR decomposition, this is the
hardest stage to replicate with open-source tools. Beeble's model was
trained on portrait and human subject data. The open-source
alternatives were trained on different data, which affects quality
for human subjects.

### Available models

- **CHORD** (Ubisoft La Forge, SIGGRAPH Asia 2025) -- the most
  complete open-source option. Decomposes a single image into base
  color, normal, height, roughness, and metalness using chained
  diffusion. Has official ComfyUI nodes from Ubisoft. Weights on
  HuggingFace (`Ubisoft/ubisoft-laforge-chord`).
  GitHub: https://github.com/ubisoft/ComfyUI-Chord
  **License: Research-only (Ubisoft ML License)**

  Limitation: trained on the MatSynth dataset (~5700 PBR materials),
  which is texture/material focused. Results on human skin, hair, and
  clothing will be plausible but not specifically optimized for
  portrait data. The authors note metalness prediction is notably
  difficult.

- **RGB↔X** (Adobe, SIGGRAPH 2024) -- decomposes into albedo,
  roughness, metallicity, normals, AND estimates lighting. Trained on
  interior scenes. Fully open-source code and weights.
  GitHub: https://github.com/zheng95z/rgbx
  Minimum 12GB VRAM recommended.

  Limitation: trained on interior scene data, not portrait/human
  data. The albedo estimation for rooms and furniture is strong; for
  human subjects it is less well-characterized.

- **PBRify Remix** -- simpler model for generating PBR maps from
  diffuse textures. Trained on CC0 data from ambientCG, so no license
  concerns. Designed for game texture upscaling rather than
  photographic decomposition.
  GitHub: https://github.com/Kim2091/PBRify_Remix

### The honest gap

Beeble's PBR model was trained on portrait and human subject data
(likely lightstage captures, based on the CVPR paper). The
open-source alternatives were trained on material textures or interior
scenes. For portrait work, this means:

- Skin subsurface scattering properties will be better captured by
  Beeble's model
- Hair specularity and anisotropy are hard for general-purpose models
- Clothing material properties (roughness, metallic) should be
  comparable

For non-portrait subjects (products, environments, objects), the
open-source models may actually perform better since they were trained
on more diverse material data.

If your goal is manual relighting in Blender or Nuke rather than
automated AI relighting, "good enough" PBR passes are often
sufficient because you have artistic control over the final result.

### On training data and the "moat"

The CVPR paper frames lightstage training data as a significant
competitive advantage. This deserves scrutiny from VFX professionals.

For PBR decomposition training, what you actually need is a dataset
of images paired with ground-truth PBR maps--albedo, normal,
roughness, metallic. Physical lightstage captures are one way to
obtain this data, but modern synthetic rendering provides the same
thing more cheaply and at greater scale:

- **Blender character generators** (Human Generator, MB-Lab, MPFB2):
  produce characters with known material properties that can be
  rendered procedurally. Blender's Cycles renderer outputs physically
  accurate PBR passes natively. Fully open source, no licensing
  restrictions for AI training.
- **Houdini procedural pipelines**: can generate hundreds of
  thousands of unique character/lighting/pose combinations
  programmatically.
- ~~**Unreal Engine MetaHumans**~~: photorealistic digital humans
  with full PBR material definitions. However, **the MetaHuman EULA
  explicitly prohibits using MetaHumans as AI training data**: "You
  must ensure that your activities with the Licensed Technology do
  not result in using the Licensed Technology as a training input or
  prompt-based input into any Generative AI Program." MetaHumans can
  be used within AI-enhanced workflows but not to train AI models.

The ground truth is inherent in synthetic rendering: you created the
scene, so you already have the PBR maps. A VFX studio with a
standard character pipeline could generate a training dataset in a
week.

### Existing datasets and published results

The lightstage data advantage that the CVPR paper frames as a
competitive moat was real in 2023-2024. It is no longer.

**Public OLAT datasets now rival Beeble's scale:**

- **POLAR** (Dec 2025, public) -- 220 subjects, 156 light directions,
  32 views, 4K, 28.8 million images total. Beeble's CVPR paper reports
  287 subjects. POLAR is at 77% of that count, freely available.
  https://rex0191.github.io/POLAR/

- **HumanOLAT** (ICCV 2025, public gated) -- 21 subjects, full body,
  40 cameras at 6K, 331 LEDs. The first public full-body OLAT dataset.
  https://vcai.mpi-inf.mpg.de/projects/HumanOLAT/

**Synthetic approaches already match lightstage quality:**

- **SynthLight** (Adobe/Yale, CVPR 2025) -- trained purely on ~350
  synthetic 3D heads rendered in Blender with PBR materials. Achieves
  results comparable to lightstage-trained methods on lightstage test
  data. No lightstage data used at all.
  https://vrroom.github.io/synthlight/

- **NVIDIA Lumos** (SIGGRAPH Asia 2022) -- rendered 300k synthetic
  samples in a virtual lightstage. Matched state-of-the-art
  lightstage methods three years ago.

- **OpenHumanBRDF** (July 2025) -- 147 human models with full PBR
  decomposition including SSS, built in Blender. Exactly the kind
  of dataset needed for training PBR decomposition models.
  https://arxiv.org/abs/2507.18385

**Cost to replicate:** Generating a competitive synthetic dataset
costs approximately $4,500-$18,000 total (Blender + MPFB2 for
character generation, Cycles for rendering, cloud GPUs for compute).
Raw GPU compute for 100k PBR renders is approximately $55 on an A100.
CHORD (Ubisoft) trained its PBR decomposition model in 5.2 days on
a single H100, costing approximately $260-500 in compute.

With model sizes under 2 GB (based on the encrypted model files in
Beeble's distribution) and standard encoder-decoder architectures,
the compute cost to train equivalent models from synthetic data is
modest--well within reach of independent researchers or small studios.

This does not mean Beeble's trained weights are worthless. But the
barrier to replication is lower than the marketing suggests,
especially given that the model architectures are standard
open-source frameworks and equivalent training data is now publicly
available.


## 5. Relighting

**What Beeble claims**: The CVPR paper describes a "Render Net" for
relighting. This is the least well-characterized stage in our
analysis--the relighting model's architecture could not be determined
from the available evidence.

### NVIDIA DiffusionRenderer (replaces both PBR decomposition AND relighting)

This is the most significant recent development. NVIDIA's
DiffusionRenderer does the same thing as Beeble's entire core
pipeline--video to PBR passes plus relighting--in a single open-source
system.

- **DiffusionRenderer** (NVIDIA, CVPR 2025 Oral--the highest honor)
  -- a general-purpose method for both neural inverse and forward
  rendering. Two modes:
  - **Inverse**: input image/video → geometry and material buffers
    (albedo, normals, roughness, metallic)
  - **Forward**: G-buffers + environment map → photorealistic relit
    output

  The upgraded **Cosmos DiffusionRenderer** (June 2025) brings
  improved quality powered by NVIDIA Cosmos video foundation models.

  GitHub: https://github.com/nv-tlabs/cosmos-transfer1-diffusion-renderer
  Academic version: https://github.com/nv-tlabs/diffusion-renderer
  Weights: https://huggingface.co/collections/zianw/cosmos-diffusionrenderer-6849f2a4da267e55409b8125
  **License: Apache 2.0 (code), NVIDIA Open Model License (weights)**

  Hardware: approximately 16GB VRAM recommended.

  **ComfyUI integration**: A community wrapper exists at
  https://github.com/eggsbenedicto/DiffusionRenderer-ComfyUI
  (experimental, Linux tested). Requires downloading the Cosmos
  DiffusionRenderer checkpoints and NVIDIA Video Tokenizer
  (Cosmos-1.0-Tokenizer-CV8x8x8).

  This is a direct, open-source replacement for Beeble's core value
  proposition, backed by NVIDIA's resources and published as the
  highest-rated paper at CVPR 2025.

### IC-Light (image relighting)

- **IC-Light** (ICLR 2025, by lllyasviel / ControlNet creator) --
  the leading open-source image relighting model. Two modes:
  text-conditioned (describe the target lighting) and
  background-conditioned (provide a background image whose lighting
  should be matched). Based on Stable Diffusion. V2 available with
  16-channel VAE.
  GitHub: https://github.com/lllyasviel/IC-Light

  IC-Light uses diffusion-based lighting transfer rather than
  physics-based rendering. The results look different--less physically
  precise but more flexible in terms of creative lighting scenarios.

  Available in ComfyUI via multiple community node packages.

### Manual relighting with PBR passes

If you have normal, albedo, roughness, and metallic maps from steps
3-4, you can do relighting directly in any 3D application:

- **Blender**: Import the passes as textures on a plane, apply a
  Principled BSDF shader, and light the scene with any HDRI or light
  setup. This gives you full artistic control.
- **Nuke**: Use the PBR passes with Nuke's relighting nodes for
  compositing-native workflows.
- **Unreal Engine**: Import as material textures for real-time PBR
  rendering.

This approach is arguably more powerful than SwitchLight for
professional VFX work because you have complete control over the
lighting. The tradeoff is that it requires manual setup rather than
one-click processing.


## 6. Feature extraction and segmentation

**What Beeble uses**: DINOv2 via timm (feature extraction),
segmentation_models_pytorch (segmentation)

These are intermediate pipeline components in Beeble's architecture.
DINOv2 produces feature maps that feed into other models, and the
segmentation model likely handles scene parsing or material
classification.

Most users replicating Beeble's outputs will not need these directly.
StableNormal already uses DINOv2 features internally, and CHORD
handles its own segmentation. If you do need them:

```bash
pip install timm segmentation-models-pytorch
```
```python
import timm
model = timm.create_model('vit_large_patch14_dinov2.lvd142m',
                          pretrained=True)
```


## Comparison with Beeble

| Pipeline stage | Beeble model | Open-source equivalent | Parity |
|-------------|-------------|----------------------|--------|
| Person detection | RT-DETR (open source) | RT-DETR / YOLOv8 | Identical (same model) |
| Face detection | Kornia face detection (open source) | Kornia / RetinaFace | Identical (same model) |
| Tracking | BoxMOT (open source) | BoxMOT / ByteTrack | Identical (same model) |
| Alpha matte | InSPyReNet (open source) | InSPyReNet / BiRefNet | Identical (same model) |
| Depth map | Depth Anything V2 (open source) | Depth Anything V2 | Identical (same model) |
| Edge detection | DexiNed (open source) | DexiNed | Identical (same model) |
| Normal map | SMP + timm backbone (proprietary weights) | StableNormal / NormalCrafter | Comparable or better |
| Base color | SMP + timm backbone (proprietary weights) | CHORD / RGB-X | Weaker for portraits |
| Roughness | SMP + timm backbone (proprietary weights) | CHORD / RGB-X | Weaker for portraits |
| Metallic | SMP + timm backbone (proprietary weights) | CHORD / RGB-X | Weaker for portraits |
| Specular | SMP + timm backbone (proprietary weights) | CHORD / RGB-X | Weaker for portraits |
| Super resolution | RRDB-Net (open source) | ESRGAN / Real-ESRGAN | Identical (same model) |
| Relighting | Proprietary (not fully characterized) | DiffusionRenderer / IC-Light / manual | Comparable (DiffusionRenderer) |
| Full inverse+forward rendering | Entire pipeline | DiffusionRenderer (NVIDIA, CVPR 2025) | Direct open-source competitor |

The "Beeble model" column reflects what was found in the application
binary, not what the CVPR paper describes. See
[REPORT.md](REPORT.md) section 4 for the full architecture analysis.

Where open-source matches or exceeds Beeble: alpha, depth, normals,
detection, tracking, edge detection, and super resolution. Every
preprocessing stage in Beeble's pipeline uses the same open-source
models you can use directly. For video normals, NormalCrafter
provides temporal consistency comparable to Beeble's pipeline.

Where Beeble retains an advantage: PBR material decomposition for
human subjects (base color, roughness, metallic, specular). While the
architecture appears to use standard open-source frameworks, the
model was trained on portrait-specific data. The open-source PBR
models were trained on material textures and interior scenes. However,
as discussed above, the barrier to creating equivalent training data
using synthetic rendering is lower than commonly assumed.

Where DiffusionRenderer changes the picture: NVIDIA's
DiffusionRenderer (CVPR 2025 Oral) handles both inverse rendering
(video → PBR maps) and forward rendering (PBR maps + lighting →
relit output) in a single open-source system. This is the first
open-source tool that directly replicates Beeble's entire core
pipeline, including relighting. It is backed by NVIDIA's resources,
uses Apache 2.0 licensing for code, and has a ComfyUI integration
available.

Where open-source wins on flexibility: manual relighting in
Blender/Nuke with the extracted PBR passes gives full artistic control
that Beeble's automated pipeline does not offer.


## What this means for Beeble users

If you primarily use Beeble for alpha mattes and depth maps, you can
replicate those results for free using the exact same models.

If you use Beeble for normal maps, the open-source alternatives are
now competitive and in some cases better, with NormalCrafter solving
the video temporal consistency problem.

If you use Beeble for full PBR decomposition of portrait footage and
need high-quality material properties, Beeble's model still has an
edge due to its portrait-specific training data. But the gap is
narrowing as models like CHORD improve.

If you use Beeble for one-click relighting, NVIDIA's
DiffusionRenderer is a direct open-source competitor that handles both
PBR decomposition and relighting in a single system. IC-Light provides
a diffusion-based alternative, and manual PBR relighting in
Blender/Nuke gives you full artistic control.

The core value proposition of Beeble Studio--beyond the models
themselves--is convenience. It packages everything into a single
application with a render queue, plugin integrations, and a polished
UX. Replicating the pipeline in ComfyUI requires more setup and
technical knowledge, but costs nothing and gives you full control
over every stage.