Nicholai 86accadc28 docs: add competitive landscape and deep dive findings

Version evolution (SL 1.0→2.0→3.0), team background, no
patents, NVIDIA DiffusionRenderer as open-source competitor,
dataset landscape (POLAR, SynthLight, etc.), botocore/AWS SDK
in privacy app, MetaHuman EULA fix, user data controversy,
and DiffusionRenderer ComfyUI integration across all docs.

2026-01-26 12:41:01 -07:00

20 KiB

Raw Permalink Blame History

Replicating Beeble's Pipeline with Open-Source Tools

Most Beeble Studio users pay for PBR extractions--alpha mattes, depth maps, normal maps--rather than the relighting. The extraction pipeline is built from open-source models, and the PBR decomposition and relighting stages now have viable open-source alternatives too.

This guide documents how to replicate each stage using ComfyUI and direct Python. No workflow JSON files are provided yet, but the relevant nodes, models, and tradeoffs are documented below.

If you are unfamiliar with ComfyUI, see https://docs.comfy.org/

Pipeline overview

Input frame
    |
    +--> Background removal   -->  Alpha matte
    |    (InSPyReNet / BiRefNet)
    |
    +--> Depth estimation     -->  Depth map
    |    (Depth Anything V2)
    |
    +--> Normal estimation    -->  Normal map
    |    (StableNormal / NormalCrafter)
    |
    +--> PBR decomposition    -->  Albedo, Roughness, Metallic
    |    (CHORD / RGB↔X)
    |
    +--> Relighting           -->  Relit output
         (IC-Light / manual in Blender/Nuke)

The first two stages use the exact same models Beeble uses. The remaining stages use different models that produce comparable outputs.

1. Background removal (Alpha matte)

What Beeble uses: transparent-background / InSPyReNet (MIT)

This is the simplest stage. Several ComfyUI nodes wrap the same underlying model:

ComfyUI-InSPyReNet -- wraps the same transparent-background library Beeble uses. Install via ComfyUI Manager.
ComfyUI-BiRefNet -- uses BiRefNet, a newer model that often produces sharper edges around hair and fine detail.
ComfyUI-RMBG -- BRIA's background removal model, another strong alternative.

For video, connect an image sequence loader to the removal node and export the alpha channel as a separate pass. These models process per-frame, so there is no temporal consistency--but alpha mattes are typically stable enough that this is not a problem.

Direct Python:

pip install transparent-background

from transparent_background import Remover
remover = Remover()
alpha = remover.process(image, type='map')

2. Depth estimation

What Beeble uses: Depth Anything V2 via Kornia (Apache 2.0)

ComfyUI-DepthAnythingV2 -- dedicated nodes for all model sizes.
comfyui_controlnet_aux -- includes Depth Anything V2 as a preprocessor option.

Use the large variant for best quality. This is a per-frame model with no temporal information, but monocular depth tends to be stable across frames for most footage.

Direct Python:

pip install kornia

from kornia.contrib import DepthAnything
model = DepthAnything.from_pretrained("depth-anything-v2-large")
depth = model(image_tensor)

3. Normal estimation

What Beeble claims: The CVPR 2024 paper describes a dedicated "Normal Net" within SwitchLight. However, analysis of the deployed application found no evidence of this specific architecture--the PBR models appear to use standard encoder-decoder segmentation frameworks with pretrained backbones (see REPORT.md section 4 for details).

Multiple open-source models now produce high-quality surface normals from single images, and one handles video with temporal consistency.

For single images

StableNormal (SIGGRAPH Asia 2024) -- currently best benchmarks for monocular normal estimation. Uses a two-stage coarse-to-fine strategy with DINOv2 semantic features for guidance. A turbo variant runs 10x faster with minimal quality loss. GitHub: https://github.com/Stable-X/StableNormal
DSINE (CVPR 2024) -- discriminative CNN-based approach. No diffusion overhead, so it is fast. Competitive with StableNormal on NYUv2 benchmarks. Good choice when inference speed matters. GitHub: https://github.com/markkua/DSINE
GeoWizard (ECCV 2024) -- jointly predicts depth AND normals from a single image, which guarantees geometric consistency between the two. Available in ComfyUI via ComfyUI-Geowizard. GitHub: https://github.com/fuxiao0719/GeoWizard

For video (temporally consistent normals)

NormalCrafter (2025) -- this is the most relevant model for replicating Beeble's video pipeline. It uses video diffusion priors to produce temporally consistent normal maps across frames, directly comparable to SwitchLight 3.0's "true video model" claim. Has ComfyUI nodes via ComfyUI-NormalCrafterWrapper. GitHub: https://github.com/AIWarper/ComfyUI-NormalCrafterWrapper Paper: https://arxiv.org/abs/2504.11427

Key parameters for ComfyUI:
- window_size: number of frames processed together (default 14). Larger = better temporal consistency, more VRAM.
- time_step_size: how far the window slides. Set smaller than window_size for overlapping windows and smoother transitions.

Assessment: For static images, StableNormal likely matches or exceeds Beeble's normal quality, since it is a specialized model rather than one sub-network within a larger system. For video, NormalCrafter addresses the temporal consistency problem that was previously a key differentiator of Beeble's pipeline.

4. PBR material decomposition (Albedo, Roughness, Metallic)

What Beeble claims: The CVPR 2024 paper describes a "Specular Net" and analytical albedo derivation using a Cook-Torrance reflectance model. Analysis of the deployed application found no Cook-Torrance, BRDF, or physics-based rendering terminology in the binary. The PBR models appear to use standard segmentation architectures (segmentation_models_pytorch with pretrained backbones) trained on proprietary portrait data. See REPORT.md section 4.

Regardless of how Beeble implements PBR decomposition, this is the hardest stage to replicate with open-source tools. Beeble's model was trained on portrait and human subject data. The open-source alternatives were trained on different data, which affects quality for human subjects.

Available models

CHORD (Ubisoft La Forge, SIGGRAPH Asia 2025) -- the most complete open-source option. Decomposes a single image into base color, normal, height, roughness, and metalness using chained diffusion. Has official ComfyUI nodes from Ubisoft. Weights on HuggingFace (Ubisoft/ubisoft-laforge-chord). GitHub: https://github.com/ubisoft/ComfyUI-Chord License: Research-only (Ubisoft ML License)

Limitation: trained on the MatSynth dataset (~5700 PBR materials), which is texture/material focused. Results on human skin, hair, and clothing will be plausible but not specifically optimized for portrait data. The authors note metalness prediction is notably difficult.
RGB↔X (Adobe, SIGGRAPH 2024) -- decomposes into albedo, roughness, metallicity, normals, AND estimates lighting. Trained on interior scenes. Fully open-source code and weights. GitHub: https://github.com/zheng95z/rgbx Minimum 12GB VRAM recommended.

Limitation: trained on interior scene data, not portrait/human data. The albedo estimation for rooms and furniture is strong; for human subjects it is less well-characterized.
PBRify Remix -- simpler model for generating PBR maps from diffuse textures. Trained on CC0 data from ambientCG, so no license concerns. Designed for game texture upscaling rather than photographic decomposition. GitHub: https://github.com/Kim2091/PBRify_Remix

The honest gap

Beeble's PBR model was trained on portrait and human subject data (likely lightstage captures, based on the CVPR paper). The open-source alternatives were trained on material textures or interior scenes. For portrait work, this means:

Skin subsurface scattering properties will be better captured by Beeble's model
Hair specularity and anisotropy are hard for general-purpose models
Clothing material properties (roughness, metallic) should be comparable

For non-portrait subjects (products, environments, objects), the open-source models may actually perform better since they were trained on more diverse material data.

If your goal is manual relighting in Blender or Nuke rather than automated AI relighting, "good enough" PBR passes are often sufficient because you have artistic control over the final result.

On training data and the "moat"

The CVPR paper frames lightstage training data as a significant competitive advantage. This deserves scrutiny from VFX professionals.

For PBR decomposition training, what you actually need is a dataset of images paired with ground-truth PBR maps--albedo, normal, roughness, metallic. Physical lightstage captures are one way to obtain this data, but modern synthetic rendering provides the same thing more cheaply and at greater scale:

Blender character generators (Human Generator, MB-Lab, MPFB2): produce characters with known material properties that can be rendered procedurally. Blender's Cycles renderer outputs physically accurate PBR passes natively. Fully open source, no licensing restrictions for AI training.
Houdini procedural pipelines: can generate hundreds of thousands of unique character/lighting/pose combinations programmatically.
~~Unreal Engine MetaHumans~~: photorealistic digital humans with full PBR material definitions. However, the MetaHuman EULA explicitly prohibits using MetaHumans as AI training data: "You must ensure that your activities with the Licensed Technology do not result in using the Licensed Technology as a training input or prompt-based input into any Generative AI Program." MetaHumans can be used within AI-enhanced workflows but not to train AI models.

The ground truth is inherent in synthetic rendering: you created the scene, so you already have the PBR maps. A VFX studio with a standard character pipeline could generate a training dataset in a week.

Existing datasets and published results

The lightstage data advantage that the CVPR paper frames as a competitive moat was real in 2023-2024. It is no longer.

Public OLAT datasets now rival Beeble's scale:

POLAR (Dec 2025, public) -- 220 subjects, 156 light directions, 32 views, 4K, 28.8 million images total. Beeble's CVPR paper reports 287 subjects. POLAR is at 77% of that count, freely available. https://rex0191.github.io/POLAR/
HumanOLAT (ICCV 2025, public gated) -- 21 subjects, full body, 40 cameras at 6K, 331 LEDs. The first public full-body OLAT dataset. https://vcai.mpi-inf.mpg.de/projects/HumanOLAT/

Synthetic approaches already match lightstage quality:

SynthLight (Adobe/Yale, CVPR 2025) -- trained purely on ~350 synthetic 3D heads rendered in Blender with PBR materials. Achieves results comparable to lightstage-trained methods on lightstage test data. No lightstage data used at all. https://vrroom.github.io/synthlight/
NVIDIA Lumos (SIGGRAPH Asia 2022) -- rendered 300k synthetic samples in a virtual lightstage. Matched state-of-the-art lightstage methods three years ago.
OpenHumanBRDF (July 2025) -- 147 human models with full PBR decomposition including SSS, built in Blender. Exactly the kind of dataset needed for training PBR decomposition models. https://arxiv.org/abs/2507.18385

Cost to replicate: Generating a competitive synthetic dataset costs approximately $4,500-$18,000 total (Blender + MPFB2 for character generation, Cycles for rendering, cloud GPUs for compute). Raw GPU compute for 100k PBR renders is approximately $55 on an A100. CHORD (Ubisoft) trained its PBR decomposition model in 5.2 days on a single H100, costing approximately $260-500 in compute.

With model sizes under 2 GB (based on the encrypted model files in Beeble's distribution) and standard encoder-decoder architectures, the compute cost to train equivalent models from synthetic data is modest--well within reach of independent researchers or small studios.

This does not mean Beeble's trained weights are worthless. But the barrier to replication is lower than the marketing suggests, especially given that the model architectures are standard open-source frameworks and equivalent training data is now publicly available.

5. Relighting

What Beeble claims: The CVPR paper describes a "Render Net" for relighting. This is the least well-characterized stage in our analysis--the relighting model's architecture could not be determined from the available evidence.

NVIDIA DiffusionRenderer (replaces both PBR decomposition AND relighting)

This is the most significant recent development. NVIDIA's DiffusionRenderer does the same thing as Beeble's entire core pipeline--video to PBR passes plus relighting--in a single open-source system.

DiffusionRenderer (NVIDIA, CVPR 2025 Oral--the highest honor) -- a general-purpose method for both neural inverse and forward rendering. Two modes:
- Inverse: input image/video → geometry and material buffers (albedo, normals, roughness, metallic)
- Forward: G-buffers + environment map → photorealistic relit output
The upgraded Cosmos DiffusionRenderer (June 2025) brings improved quality powered by NVIDIA Cosmos video foundation models.

GitHub: https://github.com/nv-tlabs/cosmos-transfer1-diffusion-renderer Academic version: https://github.com/nv-tlabs/diffusion-renderer Weights: https://huggingface.co/collections/zianw/cosmos-diffusionrenderer-6849f2a4da267e55409b8125 License: Apache 2.0 (code), NVIDIA Open Model License (weights)

Hardware: approximately 16GB VRAM recommended.

ComfyUI integration: A community wrapper exists at https://github.com/eggsbenedicto/DiffusionRenderer-ComfyUI (experimental, Linux tested). Requires downloading the Cosmos DiffusionRenderer checkpoints and NVIDIA Video Tokenizer (Cosmos-1.0-Tokenizer-CV8x8x8).

This is a direct, open-source replacement for Beeble's core value proposition, backed by NVIDIA's resources and published as the highest-rated paper at CVPR 2025.

IC-Light (image relighting)

IC-Light (ICLR 2025, by lllyasviel / ControlNet creator) -- the leading open-source image relighting model. Two modes: text-conditioned (describe the target lighting) and background-conditioned (provide a background image whose lighting should be matched). Based on Stable Diffusion. V2 available with 16-channel VAE. GitHub: https://github.com/lllyasviel/IC-Light

IC-Light uses diffusion-based lighting transfer rather than physics-based rendering. The results look different--less physically precise but more flexible in terms of creative lighting scenarios.

Available in ComfyUI via multiple community node packages.

Manual relighting with PBR passes

If you have normal, albedo, roughness, and metallic maps from steps 3-4, you can do relighting directly in any 3D application:

Blender: Import the passes as textures on a plane, apply a Principled BSDF shader, and light the scene with any HDRI or light setup. This gives you full artistic control.
Nuke: Use the PBR passes with Nuke's relighting nodes for compositing-native workflows.
Unreal Engine: Import as material textures for real-time PBR rendering.

This approach is arguably more powerful than SwitchLight for professional VFX work because you have complete control over the lighting. The tradeoff is that it requires manual setup rather than one-click processing.

6. Feature extraction and segmentation

What Beeble uses: DINOv2 via timm (feature extraction), segmentation_models_pytorch (segmentation)

These are intermediate pipeline components in Beeble's architecture. DINOv2 produces feature maps that feed into other models, and the segmentation model likely handles scene parsing or material classification.

Most users replicating Beeble's outputs will not need these directly. StableNormal already uses DINOv2 features internally, and CHORD handles its own segmentation. If you do need them:

pip install timm segmentation-models-pytorch

import timm
model = timm.create_model('vit_large_patch14_dinov2.lvd142m',
                          pretrained=True)

Comparison with Beeble

Pipeline stage	Beeble model	Open-source equivalent	Parity
Person detection	RT-DETR (open source)	RT-DETR / YOLOv8	Identical (same model)
Face detection	Kornia face detection (open source)	Kornia / RetinaFace	Identical (same model)
Tracking	BoxMOT (open source)	BoxMOT / ByteTrack	Identical (same model)
Alpha matte	InSPyReNet (open source)	InSPyReNet / BiRefNet	Identical (same model)
Depth map	Depth Anything V2 (open source)	Depth Anything V2	Identical (same model)
Edge detection	DexiNed (open source)	DexiNed	Identical (same model)
Normal map	SMP + timm backbone (proprietary weights)	StableNormal / NormalCrafter	Comparable or better
Base color	SMP + timm backbone (proprietary weights)	CHORD / RGB-X	Weaker for portraits
Roughness	SMP + timm backbone (proprietary weights)	CHORD / RGB-X	Weaker for portraits
Metallic	SMP + timm backbone (proprietary weights)	CHORD / RGB-X	Weaker for portraits
Specular	SMP + timm backbone (proprietary weights)	CHORD / RGB-X	Weaker for portraits
Super resolution	RRDB-Net (open source)	ESRGAN / Real-ESRGAN	Identical (same model)
Relighting	Proprietary (not fully characterized)	DiffusionRenderer / IC-Light / manual	Comparable (DiffusionRenderer)
Full inverse+forward rendering	Entire pipeline	DiffusionRenderer (NVIDIA, CVPR 2025)	Direct open-source competitor

The "Beeble model" column reflects what was found in the application binary, not what the CVPR paper describes. See REPORT.md section 4 for the full architecture analysis.

Where open-source matches or exceeds Beeble: alpha, depth, normals, detection, tracking, edge detection, and super resolution. Every preprocessing stage in Beeble's pipeline uses the same open-source models you can use directly. For video normals, NormalCrafter provides temporal consistency comparable to Beeble's pipeline.

Where Beeble retains an advantage: PBR material decomposition for human subjects (base color, roughness, metallic, specular). While the architecture appears to use standard open-source frameworks, the model was trained on portrait-specific data. The open-source PBR models were trained on material textures and interior scenes. However, as discussed above, the barrier to creating equivalent training data using synthetic rendering is lower than commonly assumed.

Where DiffusionRenderer changes the picture: NVIDIA's DiffusionRenderer (CVPR 2025 Oral) handles both inverse rendering (video → PBR maps) and forward rendering (PBR maps + lighting → relit output) in a single open-source system. This is the first open-source tool that directly replicates Beeble's entire core pipeline, including relighting. It is backed by NVIDIA's resources, uses Apache 2.0 licensing for code, and has a ComfyUI integration available.

Where open-source wins on flexibility: manual relighting in Blender/Nuke with the extracted PBR passes gives full artistic control that Beeble's automated pipeline does not offer.

What this means for Beeble users

If you primarily use Beeble for alpha mattes and depth maps, you can replicate those results for free using the exact same models.

If you use Beeble for normal maps, the open-source alternatives are now competitive and in some cases better, with NormalCrafter solving the video temporal consistency problem.

If you use Beeble for full PBR decomposition of portrait footage and need high-quality material properties, Beeble's model still has an edge due to its portrait-specific training data. But the gap is narrowing as models like CHORD improve.

If you use Beeble for one-click relighting, NVIDIA's DiffusionRenderer is a direct open-source competitor that handles both PBR decomposition and relighting in a single system. IC-Light provides a diffusion-based alternative, and manual PBR relighting in Blender/Nuke gives you full artistic control.

The core value proposition of Beeble Studio--beyond the models themselves--is convenience. It packages everything into a single application with a render queue, plugin integrations, and a polished UX. Replicating the pipeline in ComfyUI requires more setup and technical knowledge, but costs nothing and gives you full control over every stage.

20 KiB Raw Permalink Blame History