beeble-forensic-analysis/docs/COMFYUI_GUIDE.md
Nicholai 86accadc28 docs: add competitive landscape and deep dive findings
Version evolution (SL 1.0→2.0→3.0), team background, no
patents, NVIDIA DiffusionRenderer as open-source competitor,
dataset landscape (POLAR, SynthLight, etc.), botocore/AWS SDK
in privacy app, MetaHuman EULA fix, user data controversy,
and DiffusionRenderer ComfyUI integration across all docs.
2026-01-26 12:41:01 -07:00

477 lines
20 KiB
Markdown

# Replicating Beeble's Pipeline with Open-Source Tools
Most Beeble Studio users pay for PBR extractions--alpha mattes, depth
maps, normal maps--rather than the relighting. The extraction pipeline
is built from open-source models, and the PBR decomposition and
relighting stages now have viable open-source alternatives too.
This guide documents how to replicate each stage using ComfyUI and
direct Python. No workflow JSON files are provided yet, but the
relevant nodes, models, and tradeoffs are documented below.
If you are unfamiliar with ComfyUI, see https://docs.comfy.org/
## Pipeline overview
```
Input frame
|
+--> Background removal --> Alpha matte
| (InSPyReNet / BiRefNet)
|
+--> Depth estimation --> Depth map
| (Depth Anything V2)
|
+--> Normal estimation --> Normal map
| (StableNormal / NormalCrafter)
|
+--> PBR decomposition --> Albedo, Roughness, Metallic
| (CHORD / RGB↔X)
|
+--> Relighting --> Relit output
(IC-Light / manual in Blender/Nuke)
```
The first two stages use the exact same models Beeble uses. The
remaining stages use different models that produce comparable outputs.
## 1. Background removal (Alpha matte)
**What Beeble uses**: transparent-background / InSPyReNet (MIT)
This is the simplest stage. Several ComfyUI nodes wrap the same
underlying model:
- **ComfyUI-InSPyReNet** -- wraps the same `transparent-background`
library Beeble uses. Install via ComfyUI Manager.
- **ComfyUI-BiRefNet** -- uses BiRefNet, a newer model that often
produces sharper edges around hair and fine detail.
- **ComfyUI-RMBG** -- BRIA's background removal model, another
strong alternative.
For video, connect an image sequence loader to the removal node and
export the alpha channel as a separate pass. These models process
per-frame, so there is no temporal consistency--but alpha mattes are
typically stable enough that this is not a problem.
**Direct Python**:
```bash
pip install transparent-background
```
```python
from transparent_background import Remover
remover = Remover()
alpha = remover.process(image, type='map')
```
## 2. Depth estimation
**What Beeble uses**: Depth Anything V2 via Kornia (Apache 2.0)
- **ComfyUI-DepthAnythingV2** -- dedicated nodes for all model sizes.
- **comfyui_controlnet_aux** -- includes Depth Anything V2 as a
preprocessor option.
Use the `large` variant for best quality. This is a per-frame model
with no temporal information, but monocular depth tends to be stable
across frames for most footage.
**Direct Python**:
```bash
pip install kornia
```
```python
from kornia.contrib import DepthAnything
model = DepthAnything.from_pretrained("depth-anything-v2-large")
depth = model(image_tensor)
```
## 3. Normal estimation
**What Beeble claims**: The CVPR 2024 paper describes a dedicated
"Normal Net" within SwitchLight. However, analysis of the deployed
application found no evidence of this specific architecture--the
PBR models appear to use standard encoder-decoder segmentation
frameworks with pretrained backbones (see [REPORT.md](REPORT.md)
section 4 for details).
Multiple open-source models now produce high-quality surface normals
from single images, and one handles video with temporal consistency.
### For single images
- **StableNormal** (SIGGRAPH Asia 2024) -- currently best benchmarks
for monocular normal estimation. Uses a two-stage coarse-to-fine
strategy with DINOv2 semantic features for guidance. A turbo variant
runs 10x faster with minimal quality loss.
GitHub: https://github.com/Stable-X/StableNormal
- **DSINE** (CVPR 2024) -- discriminative CNN-based approach. No
diffusion overhead, so it is fast. Competitive with StableNormal on
NYUv2 benchmarks. Good choice when inference speed matters.
GitHub: https://github.com/markkua/DSINE
- **GeoWizard** (ECCV 2024) -- jointly predicts depth AND normals
from a single image, which guarantees geometric consistency between
the two. Available in ComfyUI via ComfyUI-Geowizard.
GitHub: https://github.com/fuxiao0719/GeoWizard
### For video (temporally consistent normals)
- **NormalCrafter** (2025) -- this is the most relevant model for
replicating Beeble's video pipeline. It uses video diffusion priors
to produce temporally consistent normal maps across frames,
directly comparable to SwitchLight 3.0's "true video model" claim.
Has ComfyUI nodes via ComfyUI-NormalCrafterWrapper.
GitHub: https://github.com/AIWarper/ComfyUI-NormalCrafterWrapper
Paper: https://arxiv.org/abs/2504.11427
Key parameters for ComfyUI:
- `window_size`: number of frames processed together (default 14).
Larger = better temporal consistency, more VRAM.
- `time_step_size`: how far the window slides. Set smaller than
window_size for overlapping windows and smoother transitions.
**Assessment**: For static images, StableNormal likely matches or
exceeds Beeble's normal quality, since it is a specialized model
rather than one sub-network within a larger system. For video,
NormalCrafter addresses the temporal consistency problem that was
previously a key differentiator of Beeble's pipeline.
## 4. PBR material decomposition (Albedo, Roughness, Metallic)
**What Beeble claims**: The CVPR 2024 paper describes a "Specular Net"
and analytical albedo derivation using a Cook-Torrance reflectance
model. Analysis of the deployed application found no Cook-Torrance,
BRDF, or physics-based rendering terminology in the binary. The PBR
models appear to use standard segmentation architectures
(segmentation_models_pytorch with pretrained backbones) trained on
proprietary portrait data. See [REPORT.md](REPORT.md) section 4.
Regardless of how Beeble implements PBR decomposition, this is the
hardest stage to replicate with open-source tools. Beeble's model was
trained on portrait and human subject data. The open-source
alternatives were trained on different data, which affects quality
for human subjects.
### Available models
- **CHORD** (Ubisoft La Forge, SIGGRAPH Asia 2025) -- the most
complete open-source option. Decomposes a single image into base
color, normal, height, roughness, and metalness using chained
diffusion. Has official ComfyUI nodes from Ubisoft. Weights on
HuggingFace (`Ubisoft/ubisoft-laforge-chord`).
GitHub: https://github.com/ubisoft/ComfyUI-Chord
**License: Research-only (Ubisoft ML License)**
Limitation: trained on the MatSynth dataset (~5700 PBR materials),
which is texture/material focused. Results on human skin, hair, and
clothing will be plausible but not specifically optimized for
portrait data. The authors note metalness prediction is notably
difficult.
- **RGB↔X** (Adobe, SIGGRAPH 2024) -- decomposes into albedo,
roughness, metallicity, normals, AND estimates lighting. Trained on
interior scenes. Fully open-source code and weights.
GitHub: https://github.com/zheng95z/rgbx
Minimum 12GB VRAM recommended.
Limitation: trained on interior scene data, not portrait/human
data. The albedo estimation for rooms and furniture is strong; for
human subjects it is less well-characterized.
- **PBRify Remix** -- simpler model for generating PBR maps from
diffuse textures. Trained on CC0 data from ambientCG, so no license
concerns. Designed for game texture upscaling rather than
photographic decomposition.
GitHub: https://github.com/Kim2091/PBRify_Remix
### The honest gap
Beeble's PBR model was trained on portrait and human subject data
(likely lightstage captures, based on the CVPR paper). The
open-source alternatives were trained on material textures or interior
scenes. For portrait work, this means:
- Skin subsurface scattering properties will be better captured by
Beeble's model
- Hair specularity and anisotropy are hard for general-purpose models
- Clothing material properties (roughness, metallic) should be
comparable
For non-portrait subjects (products, environments, objects), the
open-source models may actually perform better since they were trained
on more diverse material data.
If your goal is manual relighting in Blender or Nuke rather than
automated AI relighting, "good enough" PBR passes are often
sufficient because you have artistic control over the final result.
### On training data and the "moat"
The CVPR paper frames lightstage training data as a significant
competitive advantage. This deserves scrutiny from VFX professionals.
For PBR decomposition training, what you actually need is a dataset
of images paired with ground-truth PBR maps--albedo, normal,
roughness, metallic. Physical lightstage captures are one way to
obtain this data, but modern synthetic rendering provides the same
thing more cheaply and at greater scale:
- **Blender character generators** (Human Generator, MB-Lab, MPFB2):
produce characters with known material properties that can be
rendered procedurally. Blender's Cycles renderer outputs physically
accurate PBR passes natively. Fully open source, no licensing
restrictions for AI training.
- **Houdini procedural pipelines**: can generate hundreds of
thousands of unique character/lighting/pose combinations
programmatically.
- ~~**Unreal Engine MetaHumans**~~: photorealistic digital humans
with full PBR material definitions. However, **the MetaHuman EULA
explicitly prohibits using MetaHumans as AI training data**: "You
must ensure that your activities with the Licensed Technology do
not result in using the Licensed Technology as a training input or
prompt-based input into any Generative AI Program." MetaHumans can
be used within AI-enhanced workflows but not to train AI models.
The ground truth is inherent in synthetic rendering: you created the
scene, so you already have the PBR maps. A VFX studio with a
standard character pipeline could generate a training dataset in a
week.
### Existing datasets and published results
The lightstage data advantage that the CVPR paper frames as a
competitive moat was real in 2023-2024. It is no longer.
**Public OLAT datasets now rival Beeble's scale:**
- **POLAR** (Dec 2025, public) -- 220 subjects, 156 light directions,
32 views, 4K, 28.8 million images total. Beeble's CVPR paper reports
287 subjects. POLAR is at 77% of that count, freely available.
https://rex0191.github.io/POLAR/
- **HumanOLAT** (ICCV 2025, public gated) -- 21 subjects, full body,
40 cameras at 6K, 331 LEDs. The first public full-body OLAT dataset.
https://vcai.mpi-inf.mpg.de/projects/HumanOLAT/
**Synthetic approaches already match lightstage quality:**
- **SynthLight** (Adobe/Yale, CVPR 2025) -- trained purely on ~350
synthetic 3D heads rendered in Blender with PBR materials. Achieves
results comparable to lightstage-trained methods on lightstage test
data. No lightstage data used at all.
https://vrroom.github.io/synthlight/
- **NVIDIA Lumos** (SIGGRAPH Asia 2022) -- rendered 300k synthetic
samples in a virtual lightstage. Matched state-of-the-art
lightstage methods three years ago.
- **OpenHumanBRDF** (July 2025) -- 147 human models with full PBR
decomposition including SSS, built in Blender. Exactly the kind
of dataset needed for training PBR decomposition models.
https://arxiv.org/abs/2507.18385
**Cost to replicate:** Generating a competitive synthetic dataset
costs approximately $4,500-$18,000 total (Blender + MPFB2 for
character generation, Cycles for rendering, cloud GPUs for compute).
Raw GPU compute for 100k PBR renders is approximately $55 on an A100.
CHORD (Ubisoft) trained its PBR decomposition model in 5.2 days on
a single H100, costing approximately $260-500 in compute.
With model sizes under 2 GB (based on the encrypted model files in
Beeble's distribution) and standard encoder-decoder architectures,
the compute cost to train equivalent models from synthetic data is
modest--well within reach of independent researchers or small studios.
This does not mean Beeble's trained weights are worthless. But the
barrier to replication is lower than the marketing suggests,
especially given that the model architectures are standard
open-source frameworks and equivalent training data is now publicly
available.
## 5. Relighting
**What Beeble claims**: The CVPR paper describes a "Render Net" for
relighting. This is the least well-characterized stage in our
analysis--the relighting model's architecture could not be determined
from the available evidence.
### NVIDIA DiffusionRenderer (replaces both PBR decomposition AND relighting)
This is the most significant recent development. NVIDIA's
DiffusionRenderer does the same thing as Beeble's entire core
pipeline--video to PBR passes plus relighting--in a single open-source
system.
- **DiffusionRenderer** (NVIDIA, CVPR 2025 Oral--the highest honor)
-- a general-purpose method for both neural inverse and forward
rendering. Two modes:
- **Inverse**: input image/video → geometry and material buffers
(albedo, normals, roughness, metallic)
- **Forward**: G-buffers + environment map → photorealistic relit
output
The upgraded **Cosmos DiffusionRenderer** (June 2025) brings
improved quality powered by NVIDIA Cosmos video foundation models.
GitHub: https://github.com/nv-tlabs/cosmos-transfer1-diffusion-renderer
Academic version: https://github.com/nv-tlabs/diffusion-renderer
Weights: https://huggingface.co/collections/zianw/cosmos-diffusionrenderer-6849f2a4da267e55409b8125
**License: Apache 2.0 (code), NVIDIA Open Model License (weights)**
Hardware: approximately 16GB VRAM recommended.
**ComfyUI integration**: A community wrapper exists at
https://github.com/eggsbenedicto/DiffusionRenderer-ComfyUI
(experimental, Linux tested). Requires downloading the Cosmos
DiffusionRenderer checkpoints and NVIDIA Video Tokenizer
(Cosmos-1.0-Tokenizer-CV8x8x8).
This is a direct, open-source replacement for Beeble's core value
proposition, backed by NVIDIA's resources and published as the
highest-rated paper at CVPR 2025.
### IC-Light (image relighting)
- **IC-Light** (ICLR 2025, by lllyasviel / ControlNet creator) --
the leading open-source image relighting model. Two modes:
text-conditioned (describe the target lighting) and
background-conditioned (provide a background image whose lighting
should be matched). Based on Stable Diffusion. V2 available with
16-channel VAE.
GitHub: https://github.com/lllyasviel/IC-Light
IC-Light uses diffusion-based lighting transfer rather than
physics-based rendering. The results look different--less physically
precise but more flexible in terms of creative lighting scenarios.
Available in ComfyUI via multiple community node packages.
### Manual relighting with PBR passes
If you have normal, albedo, roughness, and metallic maps from steps
3-4, you can do relighting directly in any 3D application:
- **Blender**: Import the passes as textures on a plane, apply a
Principled BSDF shader, and light the scene with any HDRI or light
setup. This gives you full artistic control.
- **Nuke**: Use the PBR passes with Nuke's relighting nodes for
compositing-native workflows.
- **Unreal Engine**: Import as material textures for real-time PBR
rendering.
This approach is arguably more powerful than SwitchLight for
professional VFX work because you have complete control over the
lighting. The tradeoff is that it requires manual setup rather than
one-click processing.
## 6. Feature extraction and segmentation
**What Beeble uses**: DINOv2 via timm (feature extraction),
segmentation_models_pytorch (segmentation)
These are intermediate pipeline components in Beeble's architecture.
DINOv2 produces feature maps that feed into other models, and the
segmentation model likely handles scene parsing or material
classification.
Most users replicating Beeble's outputs will not need these directly.
StableNormal already uses DINOv2 features internally, and CHORD
handles its own segmentation. If you do need them:
```bash
pip install timm segmentation-models-pytorch
```
```python
import timm
model = timm.create_model('vit_large_patch14_dinov2.lvd142m',
pretrained=True)
```
## Comparison with Beeble
| Pipeline stage | Beeble model | Open-source equivalent | Parity |
|-------------|-------------|----------------------|--------|
| Person detection | RT-DETR (open source) | RT-DETR / YOLOv8 | Identical (same model) |
| Face detection | Kornia face detection (open source) | Kornia / RetinaFace | Identical (same model) |
| Tracking | BoxMOT (open source) | BoxMOT / ByteTrack | Identical (same model) |
| Alpha matte | InSPyReNet (open source) | InSPyReNet / BiRefNet | Identical (same model) |
| Depth map | Depth Anything V2 (open source) | Depth Anything V2 | Identical (same model) |
| Edge detection | DexiNed (open source) | DexiNed | Identical (same model) |
| Normal map | SMP + timm backbone (proprietary weights) | StableNormal / NormalCrafter | Comparable or better |
| Base color | SMP + timm backbone (proprietary weights) | CHORD / RGB-X | Weaker for portraits |
| Roughness | SMP + timm backbone (proprietary weights) | CHORD / RGB-X | Weaker for portraits |
| Metallic | SMP + timm backbone (proprietary weights) | CHORD / RGB-X | Weaker for portraits |
| Specular | SMP + timm backbone (proprietary weights) | CHORD / RGB-X | Weaker for portraits |
| Super resolution | RRDB-Net (open source) | ESRGAN / Real-ESRGAN | Identical (same model) |
| Relighting | Proprietary (not fully characterized) | DiffusionRenderer / IC-Light / manual | Comparable (DiffusionRenderer) |
| Full inverse+forward rendering | Entire pipeline | DiffusionRenderer (NVIDIA, CVPR 2025) | Direct open-source competitor |
The "Beeble model" column reflects what was found in the application
binary, not what the CVPR paper describes. See
[REPORT.md](REPORT.md) section 4 for the full architecture analysis.
Where open-source matches or exceeds Beeble: alpha, depth, normals,
detection, tracking, edge detection, and super resolution. Every
preprocessing stage in Beeble's pipeline uses the same open-source
models you can use directly. For video normals, NormalCrafter
provides temporal consistency comparable to Beeble's pipeline.
Where Beeble retains an advantage: PBR material decomposition for
human subjects (base color, roughness, metallic, specular). While the
architecture appears to use standard open-source frameworks, the
model was trained on portrait-specific data. The open-source PBR
models were trained on material textures and interior scenes. However,
as discussed above, the barrier to creating equivalent training data
using synthetic rendering is lower than commonly assumed.
Where DiffusionRenderer changes the picture: NVIDIA's
DiffusionRenderer (CVPR 2025 Oral) handles both inverse rendering
(video → PBR maps) and forward rendering (PBR maps + lighting →
relit output) in a single open-source system. This is the first
open-source tool that directly replicates Beeble's entire core
pipeline, including relighting. It is backed by NVIDIA's resources,
uses Apache 2.0 licensing for code, and has a ComfyUI integration
available.
Where open-source wins on flexibility: manual relighting in
Blender/Nuke with the extracted PBR passes gives full artistic control
that Beeble's automated pipeline does not offer.
## What this means for Beeble users
If you primarily use Beeble for alpha mattes and depth maps, you can
replicate those results for free using the exact same models.
If you use Beeble for normal maps, the open-source alternatives are
now competitive and in some cases better, with NormalCrafter solving
the video temporal consistency problem.
If you use Beeble for full PBR decomposition of portrait footage and
need high-quality material properties, Beeble's model still has an
edge due to its portrait-specific training data. But the gap is
narrowing as models like CHORD improve.
If you use Beeble for one-click relighting, NVIDIA's
DiffusionRenderer is a direct open-source competitor that handles both
PBR decomposition and relighting in a single system. IC-Light provides
a diffusion-based alternative, and manual PBR relighting in
Blender/Nuke gives you full artistic control.
The core value proposition of Beeble Studio--beyond the models
themselves--is convenience. It packages everything into a single
application with a render queue, plugin integrations, and a polished
UX. Replicating the pipeline in ComfyUI requires more setup and
technical knowledge, but costs nothing and gives you full control
over every stage.