16 KiB
Replicating Beeble's Pipeline with Open-Source Tools
Most Beeble Studio users pay for PBR extractions--alpha mattes, depth maps, normal maps--rather than the relighting. The extraction pipeline is built from open-source models, and the PBR decomposition and relighting stages now have viable open-source alternatives too.
This guide documents how to replicate each stage using ComfyUI and direct Python. No workflow JSON files are provided yet, but the relevant nodes, models, and tradeoffs are documented below.
If you are unfamiliar with ComfyUI, see https://docs.comfy.org/
Pipeline overview
Input frame
|
+--> Background removal --> Alpha matte
| (InSPyReNet / BiRefNet)
|
+--> Depth estimation --> Depth map
| (Depth Anything V2)
|
+--> Normal estimation --> Normal map
| (StableNormal / NormalCrafter)
|
+--> PBR decomposition --> Albedo, Roughness, Metallic
| (CHORD / RGB↔X)
|
+--> Relighting --> Relit output
(IC-Light / manual in Blender/Nuke)
The first two stages use the exact same models Beeble uses. The remaining stages use different models that produce comparable outputs.
1. Background removal (Alpha matte)
What Beeble uses: transparent-background / InSPyReNet (MIT)
This is the simplest stage. Several ComfyUI nodes wrap the same underlying model:
- ComfyUI-InSPyReNet -- wraps the same
transparent-backgroundlibrary Beeble uses. Install via ComfyUI Manager. - ComfyUI-BiRefNet -- uses BiRefNet, a newer model that often produces sharper edges around hair and fine detail.
- ComfyUI-RMBG -- BRIA's background removal model, another strong alternative.
For video, connect an image sequence loader to the removal node and export the alpha channel as a separate pass. These models process per-frame, so there is no temporal consistency--but alpha mattes are typically stable enough that this is not a problem.
Direct Python:
pip install transparent-background
from transparent_background import Remover
remover = Remover()
alpha = remover.process(image, type='map')
2. Depth estimation
What Beeble uses: Depth Anything V2 via Kornia (Apache 2.0)
- ComfyUI-DepthAnythingV2 -- dedicated nodes for all model sizes.
- comfyui_controlnet_aux -- includes Depth Anything V2 as a preprocessor option.
Use the large variant for best quality. This is a per-frame model
with no temporal information, but monocular depth tends to be stable
across frames for most footage.
Direct Python:
pip install kornia
from kornia.contrib import DepthAnything
model = DepthAnything.from_pretrained("depth-anything-v2-large")
depth = model(image_tensor)
3. Normal estimation
What Beeble claims: The CVPR 2024 paper describes a dedicated "Normal Net" within SwitchLight. However, analysis of the deployed application found no evidence of this specific architecture--the PBR models appear to use standard encoder-decoder segmentation frameworks with pretrained backbones (see REPORT.md section 4 for details).
Multiple open-source models now produce high-quality surface normals from single images, and one handles video with temporal consistency.
For single images
-
StableNormal (SIGGRAPH Asia 2024) -- currently best benchmarks for monocular normal estimation. Uses a two-stage coarse-to-fine strategy with DINOv2 semantic features for guidance. A turbo variant runs 10x faster with minimal quality loss. GitHub: https://github.com/Stable-X/StableNormal
-
DSINE (CVPR 2024) -- discriminative CNN-based approach. No diffusion overhead, so it is fast. Competitive with StableNormal on NYUv2 benchmarks. Good choice when inference speed matters. GitHub: https://github.com/markkua/DSINE
-
GeoWizard (ECCV 2024) -- jointly predicts depth AND normals from a single image, which guarantees geometric consistency between the two. Available in ComfyUI via ComfyUI-Geowizard. GitHub: https://github.com/fuxiao0719/GeoWizard
For video (temporally consistent normals)
-
NormalCrafter (2025) -- this is the most relevant model for replicating Beeble's video pipeline. It uses video diffusion priors to produce temporally consistent normal maps across frames, directly comparable to SwitchLight 3.0's "true video model" claim. Has ComfyUI nodes via ComfyUI-NormalCrafterWrapper. GitHub: https://github.com/AIWarper/ComfyUI-NormalCrafterWrapper Paper: https://arxiv.org/abs/2504.11427
Key parameters for ComfyUI:
window_size: number of frames processed together (default 14). Larger = better temporal consistency, more VRAM.time_step_size: how far the window slides. Set smaller than window_size for overlapping windows and smoother transitions.
Assessment: For static images, StableNormal likely matches or exceeds Beeble's normal quality, since it is a specialized model rather than one sub-network within a larger system. For video, NormalCrafter addresses the temporal consistency problem that was previously a key differentiator of Beeble's pipeline.
4. PBR material decomposition (Albedo, Roughness, Metallic)
What Beeble claims: The CVPR 2024 paper describes a "Specular Net" and analytical albedo derivation using a Cook-Torrance reflectance model. Analysis of the deployed application found no Cook-Torrance, BRDF, or physics-based rendering terminology in the binary. The PBR models appear to use standard segmentation architectures (segmentation_models_pytorch with pretrained backbones) trained on proprietary portrait data. See REPORT.md section 4.
Regardless of how Beeble implements PBR decomposition, this is the hardest stage to replicate with open-source tools. Beeble's model was trained on portrait and human subject data. The open-source alternatives were trained on different data, which affects quality for human subjects.
Available models
-
CHORD (Ubisoft La Forge, SIGGRAPH Asia 2025) -- the most complete open-source option. Decomposes a single image into base color, normal, height, roughness, and metalness using chained diffusion. Has official ComfyUI nodes from Ubisoft. Weights on HuggingFace (
Ubisoft/ubisoft-laforge-chord). GitHub: https://github.com/ubisoft/ComfyUI-Chord License: Research-only (Ubisoft ML License)Limitation: trained on the MatSynth dataset (~5700 PBR materials), which is texture/material focused. Results on human skin, hair, and clothing will be plausible but not specifically optimized for portrait data. The authors note metalness prediction is notably difficult.
-
RGB↔X (Adobe, SIGGRAPH 2024) -- decomposes into albedo, roughness, metallicity, normals, AND estimates lighting. Trained on interior scenes. Fully open-source code and weights. GitHub: https://github.com/zheng95z/rgbx Minimum 12GB VRAM recommended.
Limitation: trained on interior scene data, not portrait/human data. The albedo estimation for rooms and furniture is strong; for human subjects it is less well-characterized.
-
PBRify Remix -- simpler model for generating PBR maps from diffuse textures. Trained on CC0 data from ambientCG, so no license concerns. Designed for game texture upscaling rather than photographic decomposition. GitHub: https://github.com/Kim2091/PBRify_Remix
The honest gap
Beeble's PBR model was trained on portrait and human subject data (likely lightstage captures, based on the CVPR paper). The open-source alternatives were trained on material textures or interior scenes. For portrait work, this means:
- Skin subsurface scattering properties will be better captured by Beeble's model
- Hair specularity and anisotropy are hard for general-purpose models
- Clothing material properties (roughness, metallic) should be comparable
For non-portrait subjects (products, environments, objects), the open-source models may actually perform better since they were trained on more diverse material data.
If your goal is manual relighting in Blender or Nuke rather than automated AI relighting, "good enough" PBR passes are often sufficient because you have artistic control over the final result.
On training data and the "moat"
The CVPR paper frames lightstage training data as a significant competitive advantage. This deserves scrutiny from VFX professionals.
For PBR decomposition training, what you actually need is a dataset of images paired with ground-truth PBR maps--albedo, normal, roughness, metallic. Physical lightstage captures are one way to obtain this data, but modern synthetic rendering provides the same thing more cheaply and at greater scale:
- Unreal Engine MetaHumans: photorealistic digital humans with full PBR material definitions. Render them under varied lighting and you have ground-truth PBR for each frame.
- Blender character generators (Human Generator, MB-Lab): produce characters with known material properties that can be rendered procedurally.
- Houdini procedural pipelines: can generate hundreds of thousands of unique character/lighting/pose combinations programmatically.
The ground truth is inherent in synthetic rendering: you created the scene, so you already have the PBR maps. A VFX studio with a standard character pipeline could generate a training dataset in a week.
With model sizes under 2 GB (based on the encrypted model files in Beeble's distribution) and standard encoder-decoder architectures, the compute cost to train equivalent models from synthetic data is modest--well within reach of independent researchers or small studios.
This does not mean Beeble's trained weights are worthless. But the barrier to replication is lower than the marketing suggests, especially given that the model architectures are standard open-source frameworks.
5. Relighting
What Beeble claims: The CVPR paper describes a "Render Net" for relighting. This is the least well-characterized stage in our analysis--the relighting model's architecture could not be determined from the available evidence.
AI-based relighting
-
IC-Light (ICLR 2025, by lllyasviel / ControlNet creator) -- the leading open-source relighting model. Two modes: text-conditioned (describe the target lighting) and background-conditioned (provide a background image whose lighting should be matched). Based on Stable Diffusion. GitHub: https://github.com/lllyasviel/IC-Light
IC-Light uses diffusion-based lighting transfer rather than physics-based rendering. The results look different--less physically precise but more flexible in terms of creative lighting scenarios.
Available in ComfyUI via multiple community node packages.
Manual relighting with PBR passes
If you have normal, albedo, roughness, and metallic maps from steps 3-4, you can do relighting directly in any 3D application:
- Blender: Import the passes as textures on a plane, apply a Principled BSDF shader, and light the scene with any HDRI or light setup. This gives you full artistic control.
- Nuke: Use the PBR passes with Nuke's relighting nodes for compositing-native workflows.
- Unreal Engine: Import as material textures for real-time PBR rendering.
This approach is arguably more powerful than SwitchLight for professional VFX work because you have complete control over the lighting. The tradeoff is that it requires manual setup rather than one-click processing.
6. Feature extraction and segmentation
What Beeble uses: DINOv2 via timm (feature extraction), segmentation_models_pytorch (segmentation)
These are intermediate pipeline components in Beeble's architecture. DINOv2 produces feature maps that feed into other models, and the segmentation model likely handles scene parsing or material classification.
Most users replicating Beeble's outputs will not need these directly. StableNormal already uses DINOv2 features internally, and CHORD handles its own segmentation. If you do need them:
pip install timm segmentation-models-pytorch
import timm
model = timm.create_model('vit_large_patch14_dinov2.lvd142m',
pretrained=True)
Comparison with Beeble
| Pipeline stage | Beeble model | Open-source equivalent | Parity |
|---|---|---|---|
| Person detection | RT-DETR (open source) | RT-DETR / YOLOv8 | Identical (same model) |
| Face detection | Kornia face detection (open source) | Kornia / RetinaFace | Identical (same model) |
| Tracking | BoxMOT (open source) | BoxMOT / ByteTrack | Identical (same model) |
| Alpha matte | InSPyReNet (open source) | InSPyReNet / BiRefNet | Identical (same model) |
| Depth map | Depth Anything V2 (open source) | Depth Anything V2 | Identical (same model) |
| Edge detection | DexiNed (open source) | DexiNed | Identical (same model) |
| Normal map | SMP + timm backbone (proprietary weights) | StableNormal / NormalCrafter | Comparable or better |
| Base color | SMP + timm backbone (proprietary weights) | CHORD / RGB-X | Weaker for portraits |
| Roughness | SMP + timm backbone (proprietary weights) | CHORD / RGB-X | Weaker for portraits |
| Metallic | SMP + timm backbone (proprietary weights) | CHORD / RGB-X | Weaker for portraits |
| Specular | SMP + timm backbone (proprietary weights) | CHORD / RGB-X | Weaker for portraits |
| Super resolution | RRDB-Net (open source) | ESRGAN / Real-ESRGAN | Identical (same model) |
| Relighting | Proprietary (not fully characterized) | IC-Light / manual | Different approach |
The "Beeble model" column reflects what was found in the application binary, not what the CVPR paper describes. See REPORT.md section 4 for the full architecture analysis.
Where open-source matches or exceeds Beeble: alpha, depth, normals, detection, tracking, edge detection, and super resolution. Every preprocessing stage in Beeble's pipeline uses the same open-source models you can use directly. For video normals, NormalCrafter provides temporal consistency comparable to Beeble's pipeline.
Where Beeble retains an advantage: PBR material decomposition for human subjects (base color, roughness, metallic, specular). While the architecture appears to use standard open-source frameworks, the model was trained on portrait-specific data. The open-source PBR models were trained on material textures and interior scenes. However, as discussed above, the barrier to creating equivalent training data using synthetic rendering is lower than commonly assumed.
Where open-source wins on flexibility: manual relighting in Blender/Nuke with the extracted PBR passes gives full artistic control that Beeble's automated pipeline does not offer.
What this means for Beeble users
If you primarily use Beeble for alpha mattes and depth maps, you can replicate those results for free using the exact same models.
If you use Beeble for normal maps, the open-source alternatives are now competitive and in some cases better, with NormalCrafter solving the video temporal consistency problem.
If you use Beeble for full PBR decomposition of portrait footage and need high-quality material properties, Beeble's model still has an edge due to its portrait-specific training data. But the gap is narrowing as models like CHORD improve.
If you use Beeble for one-click relighting, IC-Light provides a different but functional alternative, and manual PBR relighting in Blender/Nuke gives you more control.
The core value proposition of Beeble Studio--beyond the models themselves--is convenience. It packages everything into a single application with a render queue, plugin integrations, and a polished UX. Replicating the pipeline in ComfyUI requires more setup and technical knowledge, but costs nothing and gives you full control over every stage.