Version evolution (SL 1.0→2.0→3.0), team background, no patents, NVIDIA DiffusionRenderer as open-source competitor, dataset landscape (POLAR, SynthLight, etc.), botocore/AWS SDK in privacy app, MetaHuman EULA fix, user data controversy, and DiffusionRenderer ComfyUI integration across all docs.
477 lines
20 KiB
Markdown
477 lines
20 KiB
Markdown
# Replicating Beeble's Pipeline with Open-Source Tools
|
|
|
|
Most Beeble Studio users pay for PBR extractions--alpha mattes, depth
|
|
maps, normal maps--rather than the relighting. The extraction pipeline
|
|
is built from open-source models, and the PBR decomposition and
|
|
relighting stages now have viable open-source alternatives too.
|
|
|
|
This guide documents how to replicate each stage using ComfyUI and
|
|
direct Python. No workflow JSON files are provided yet, but the
|
|
relevant nodes, models, and tradeoffs are documented below.
|
|
|
|
If you are unfamiliar with ComfyUI, see https://docs.comfy.org/
|
|
|
|
|
|
## Pipeline overview
|
|
|
|
```
|
|
Input frame
|
|
|
|
|
+--> Background removal --> Alpha matte
|
|
| (InSPyReNet / BiRefNet)
|
|
|
|
|
+--> Depth estimation --> Depth map
|
|
| (Depth Anything V2)
|
|
|
|
|
+--> Normal estimation --> Normal map
|
|
| (StableNormal / NormalCrafter)
|
|
|
|
|
+--> PBR decomposition --> Albedo, Roughness, Metallic
|
|
| (CHORD / RGB↔X)
|
|
|
|
|
+--> Relighting --> Relit output
|
|
(IC-Light / manual in Blender/Nuke)
|
|
```
|
|
|
|
The first two stages use the exact same models Beeble uses. The
|
|
remaining stages use different models that produce comparable outputs.
|
|
|
|
|
|
## 1. Background removal (Alpha matte)
|
|
|
|
**What Beeble uses**: transparent-background / InSPyReNet (MIT)
|
|
|
|
This is the simplest stage. Several ComfyUI nodes wrap the same
|
|
underlying model:
|
|
|
|
- **ComfyUI-InSPyReNet** -- wraps the same `transparent-background`
|
|
library Beeble uses. Install via ComfyUI Manager.
|
|
- **ComfyUI-BiRefNet** -- uses BiRefNet, a newer model that often
|
|
produces sharper edges around hair and fine detail.
|
|
- **ComfyUI-RMBG** -- BRIA's background removal model, another
|
|
strong alternative.
|
|
|
|
For video, connect an image sequence loader to the removal node and
|
|
export the alpha channel as a separate pass. These models process
|
|
per-frame, so there is no temporal consistency--but alpha mattes are
|
|
typically stable enough that this is not a problem.
|
|
|
|
**Direct Python**:
|
|
```bash
|
|
pip install transparent-background
|
|
```
|
|
```python
|
|
from transparent_background import Remover
|
|
remover = Remover()
|
|
alpha = remover.process(image, type='map')
|
|
```
|
|
|
|
|
|
## 2. Depth estimation
|
|
|
|
**What Beeble uses**: Depth Anything V2 via Kornia (Apache 2.0)
|
|
|
|
- **ComfyUI-DepthAnythingV2** -- dedicated nodes for all model sizes.
|
|
- **comfyui_controlnet_aux** -- includes Depth Anything V2 as a
|
|
preprocessor option.
|
|
|
|
Use the `large` variant for best quality. This is a per-frame model
|
|
with no temporal information, but monocular depth tends to be stable
|
|
across frames for most footage.
|
|
|
|
**Direct Python**:
|
|
```bash
|
|
pip install kornia
|
|
```
|
|
```python
|
|
from kornia.contrib import DepthAnything
|
|
model = DepthAnything.from_pretrained("depth-anything-v2-large")
|
|
depth = model(image_tensor)
|
|
```
|
|
|
|
|
|
## 3. Normal estimation
|
|
|
|
**What Beeble claims**: The CVPR 2024 paper describes a dedicated
|
|
"Normal Net" within SwitchLight. However, analysis of the deployed
|
|
application found no evidence of this specific architecture--the
|
|
PBR models appear to use standard encoder-decoder segmentation
|
|
frameworks with pretrained backbones (see [REPORT.md](REPORT.md)
|
|
section 4 for details).
|
|
|
|
Multiple open-source models now produce high-quality surface normals
|
|
from single images, and one handles video with temporal consistency.
|
|
|
|
### For single images
|
|
|
|
- **StableNormal** (SIGGRAPH Asia 2024) -- currently best benchmarks
|
|
for monocular normal estimation. Uses a two-stage coarse-to-fine
|
|
strategy with DINOv2 semantic features for guidance. A turbo variant
|
|
runs 10x faster with minimal quality loss.
|
|
GitHub: https://github.com/Stable-X/StableNormal
|
|
|
|
- **DSINE** (CVPR 2024) -- discriminative CNN-based approach. No
|
|
diffusion overhead, so it is fast. Competitive with StableNormal on
|
|
NYUv2 benchmarks. Good choice when inference speed matters.
|
|
GitHub: https://github.com/markkua/DSINE
|
|
|
|
- **GeoWizard** (ECCV 2024) -- jointly predicts depth AND normals
|
|
from a single image, which guarantees geometric consistency between
|
|
the two. Available in ComfyUI via ComfyUI-Geowizard.
|
|
GitHub: https://github.com/fuxiao0719/GeoWizard
|
|
|
|
### For video (temporally consistent normals)
|
|
|
|
- **NormalCrafter** (2025) -- this is the most relevant model for
|
|
replicating Beeble's video pipeline. It uses video diffusion priors
|
|
to produce temporally consistent normal maps across frames,
|
|
directly comparable to SwitchLight 3.0's "true video model" claim.
|
|
Has ComfyUI nodes via ComfyUI-NormalCrafterWrapper.
|
|
GitHub: https://github.com/AIWarper/ComfyUI-NormalCrafterWrapper
|
|
Paper: https://arxiv.org/abs/2504.11427
|
|
|
|
Key parameters for ComfyUI:
|
|
- `window_size`: number of frames processed together (default 14).
|
|
Larger = better temporal consistency, more VRAM.
|
|
- `time_step_size`: how far the window slides. Set smaller than
|
|
window_size for overlapping windows and smoother transitions.
|
|
|
|
**Assessment**: For static images, StableNormal likely matches or
|
|
exceeds Beeble's normal quality, since it is a specialized model
|
|
rather than one sub-network within a larger system. For video,
|
|
NormalCrafter addresses the temporal consistency problem that was
|
|
previously a key differentiator of Beeble's pipeline.
|
|
|
|
|
|
## 4. PBR material decomposition (Albedo, Roughness, Metallic)
|
|
|
|
**What Beeble claims**: The CVPR 2024 paper describes a "Specular Net"
|
|
and analytical albedo derivation using a Cook-Torrance reflectance
|
|
model. Analysis of the deployed application found no Cook-Torrance,
|
|
BRDF, or physics-based rendering terminology in the binary. The PBR
|
|
models appear to use standard segmentation architectures
|
|
(segmentation_models_pytorch with pretrained backbones) trained on
|
|
proprietary portrait data. See [REPORT.md](REPORT.md) section 4.
|
|
|
|
Regardless of how Beeble implements PBR decomposition, this is the
|
|
hardest stage to replicate with open-source tools. Beeble's model was
|
|
trained on portrait and human subject data. The open-source
|
|
alternatives were trained on different data, which affects quality
|
|
for human subjects.
|
|
|
|
### Available models
|
|
|
|
- **CHORD** (Ubisoft La Forge, SIGGRAPH Asia 2025) -- the most
|
|
complete open-source option. Decomposes a single image into base
|
|
color, normal, height, roughness, and metalness using chained
|
|
diffusion. Has official ComfyUI nodes from Ubisoft. Weights on
|
|
HuggingFace (`Ubisoft/ubisoft-laforge-chord`).
|
|
GitHub: https://github.com/ubisoft/ComfyUI-Chord
|
|
**License: Research-only (Ubisoft ML License)**
|
|
|
|
Limitation: trained on the MatSynth dataset (~5700 PBR materials),
|
|
which is texture/material focused. Results on human skin, hair, and
|
|
clothing will be plausible but not specifically optimized for
|
|
portrait data. The authors note metalness prediction is notably
|
|
difficult.
|
|
|
|
- **RGB↔X** (Adobe, SIGGRAPH 2024) -- decomposes into albedo,
|
|
roughness, metallicity, normals, AND estimates lighting. Trained on
|
|
interior scenes. Fully open-source code and weights.
|
|
GitHub: https://github.com/zheng95z/rgbx
|
|
Minimum 12GB VRAM recommended.
|
|
|
|
Limitation: trained on interior scene data, not portrait/human
|
|
data. The albedo estimation for rooms and furniture is strong; for
|
|
human subjects it is less well-characterized.
|
|
|
|
- **PBRify Remix** -- simpler model for generating PBR maps from
|
|
diffuse textures. Trained on CC0 data from ambientCG, so no license
|
|
concerns. Designed for game texture upscaling rather than
|
|
photographic decomposition.
|
|
GitHub: https://github.com/Kim2091/PBRify_Remix
|
|
|
|
### The honest gap
|
|
|
|
Beeble's PBR model was trained on portrait and human subject data
|
|
(likely lightstage captures, based on the CVPR paper). The
|
|
open-source alternatives were trained on material textures or interior
|
|
scenes. For portrait work, this means:
|
|
|
|
- Skin subsurface scattering properties will be better captured by
|
|
Beeble's model
|
|
- Hair specularity and anisotropy are hard for general-purpose models
|
|
- Clothing material properties (roughness, metallic) should be
|
|
comparable
|
|
|
|
For non-portrait subjects (products, environments, objects), the
|
|
open-source models may actually perform better since they were trained
|
|
on more diverse material data.
|
|
|
|
If your goal is manual relighting in Blender or Nuke rather than
|
|
automated AI relighting, "good enough" PBR passes are often
|
|
sufficient because you have artistic control over the final result.
|
|
|
|
### On training data and the "moat"
|
|
|
|
The CVPR paper frames lightstage training data as a significant
|
|
competitive advantage. This deserves scrutiny from VFX professionals.
|
|
|
|
For PBR decomposition training, what you actually need is a dataset
|
|
of images paired with ground-truth PBR maps--albedo, normal,
|
|
roughness, metallic. Physical lightstage captures are one way to
|
|
obtain this data, but modern synthetic rendering provides the same
|
|
thing more cheaply and at greater scale:
|
|
|
|
- **Blender character generators** (Human Generator, MB-Lab, MPFB2):
|
|
produce characters with known material properties that can be
|
|
rendered procedurally. Blender's Cycles renderer outputs physically
|
|
accurate PBR passes natively. Fully open source, no licensing
|
|
restrictions for AI training.
|
|
- **Houdini procedural pipelines**: can generate hundreds of
|
|
thousands of unique character/lighting/pose combinations
|
|
programmatically.
|
|
- ~~**Unreal Engine MetaHumans**~~: photorealistic digital humans
|
|
with full PBR material definitions. However, **the MetaHuman EULA
|
|
explicitly prohibits using MetaHumans as AI training data**: "You
|
|
must ensure that your activities with the Licensed Technology do
|
|
not result in using the Licensed Technology as a training input or
|
|
prompt-based input into any Generative AI Program." MetaHumans can
|
|
be used within AI-enhanced workflows but not to train AI models.
|
|
|
|
The ground truth is inherent in synthetic rendering: you created the
|
|
scene, so you already have the PBR maps. A VFX studio with a
|
|
standard character pipeline could generate a training dataset in a
|
|
week.
|
|
|
|
### Existing datasets and published results
|
|
|
|
The lightstage data advantage that the CVPR paper frames as a
|
|
competitive moat was real in 2023-2024. It is no longer.
|
|
|
|
**Public OLAT datasets now rival Beeble's scale:**
|
|
|
|
- **POLAR** (Dec 2025, public) -- 220 subjects, 156 light directions,
|
|
32 views, 4K, 28.8 million images total. Beeble's CVPR paper reports
|
|
287 subjects. POLAR is at 77% of that count, freely available.
|
|
https://rex0191.github.io/POLAR/
|
|
|
|
- **HumanOLAT** (ICCV 2025, public gated) -- 21 subjects, full body,
|
|
40 cameras at 6K, 331 LEDs. The first public full-body OLAT dataset.
|
|
https://vcai.mpi-inf.mpg.de/projects/HumanOLAT/
|
|
|
|
**Synthetic approaches already match lightstage quality:**
|
|
|
|
- **SynthLight** (Adobe/Yale, CVPR 2025) -- trained purely on ~350
|
|
synthetic 3D heads rendered in Blender with PBR materials. Achieves
|
|
results comparable to lightstage-trained methods on lightstage test
|
|
data. No lightstage data used at all.
|
|
https://vrroom.github.io/synthlight/
|
|
|
|
- **NVIDIA Lumos** (SIGGRAPH Asia 2022) -- rendered 300k synthetic
|
|
samples in a virtual lightstage. Matched state-of-the-art
|
|
lightstage methods three years ago.
|
|
|
|
- **OpenHumanBRDF** (July 2025) -- 147 human models with full PBR
|
|
decomposition including SSS, built in Blender. Exactly the kind
|
|
of dataset needed for training PBR decomposition models.
|
|
https://arxiv.org/abs/2507.18385
|
|
|
|
**Cost to replicate:** Generating a competitive synthetic dataset
|
|
costs approximately $4,500-$18,000 total (Blender + MPFB2 for
|
|
character generation, Cycles for rendering, cloud GPUs for compute).
|
|
Raw GPU compute for 100k PBR renders is approximately $55 on an A100.
|
|
CHORD (Ubisoft) trained its PBR decomposition model in 5.2 days on
|
|
a single H100, costing approximately $260-500 in compute.
|
|
|
|
With model sizes under 2 GB (based on the encrypted model files in
|
|
Beeble's distribution) and standard encoder-decoder architectures,
|
|
the compute cost to train equivalent models from synthetic data is
|
|
modest--well within reach of independent researchers or small studios.
|
|
|
|
This does not mean Beeble's trained weights are worthless. But the
|
|
barrier to replication is lower than the marketing suggests,
|
|
especially given that the model architectures are standard
|
|
open-source frameworks and equivalent training data is now publicly
|
|
available.
|
|
|
|
|
|
## 5. Relighting
|
|
|
|
**What Beeble claims**: The CVPR paper describes a "Render Net" for
|
|
relighting. This is the least well-characterized stage in our
|
|
analysis--the relighting model's architecture could not be determined
|
|
from the available evidence.
|
|
|
|
### NVIDIA DiffusionRenderer (replaces both PBR decomposition AND relighting)
|
|
|
|
This is the most significant recent development. NVIDIA's
|
|
DiffusionRenderer does the same thing as Beeble's entire core
|
|
pipeline--video to PBR passes plus relighting--in a single open-source
|
|
system.
|
|
|
|
- **DiffusionRenderer** (NVIDIA, CVPR 2025 Oral--the highest honor)
|
|
-- a general-purpose method for both neural inverse and forward
|
|
rendering. Two modes:
|
|
- **Inverse**: input image/video → geometry and material buffers
|
|
(albedo, normals, roughness, metallic)
|
|
- **Forward**: G-buffers + environment map → photorealistic relit
|
|
output
|
|
|
|
The upgraded **Cosmos DiffusionRenderer** (June 2025) brings
|
|
improved quality powered by NVIDIA Cosmos video foundation models.
|
|
|
|
GitHub: https://github.com/nv-tlabs/cosmos-transfer1-diffusion-renderer
|
|
Academic version: https://github.com/nv-tlabs/diffusion-renderer
|
|
Weights: https://huggingface.co/collections/zianw/cosmos-diffusionrenderer-6849f2a4da267e55409b8125
|
|
**License: Apache 2.0 (code), NVIDIA Open Model License (weights)**
|
|
|
|
Hardware: approximately 16GB VRAM recommended.
|
|
|
|
**ComfyUI integration**: A community wrapper exists at
|
|
https://github.com/eggsbenedicto/DiffusionRenderer-ComfyUI
|
|
(experimental, Linux tested). Requires downloading the Cosmos
|
|
DiffusionRenderer checkpoints and NVIDIA Video Tokenizer
|
|
(Cosmos-1.0-Tokenizer-CV8x8x8).
|
|
|
|
This is a direct, open-source replacement for Beeble's core value
|
|
proposition, backed by NVIDIA's resources and published as the
|
|
highest-rated paper at CVPR 2025.
|
|
|
|
### IC-Light (image relighting)
|
|
|
|
- **IC-Light** (ICLR 2025, by lllyasviel / ControlNet creator) --
|
|
the leading open-source image relighting model. Two modes:
|
|
text-conditioned (describe the target lighting) and
|
|
background-conditioned (provide a background image whose lighting
|
|
should be matched). Based on Stable Diffusion. V2 available with
|
|
16-channel VAE.
|
|
GitHub: https://github.com/lllyasviel/IC-Light
|
|
|
|
IC-Light uses diffusion-based lighting transfer rather than
|
|
physics-based rendering. The results look different--less physically
|
|
precise but more flexible in terms of creative lighting scenarios.
|
|
|
|
Available in ComfyUI via multiple community node packages.
|
|
|
|
### Manual relighting with PBR passes
|
|
|
|
If you have normal, albedo, roughness, and metallic maps from steps
|
|
3-4, you can do relighting directly in any 3D application:
|
|
|
|
- **Blender**: Import the passes as textures on a plane, apply a
|
|
Principled BSDF shader, and light the scene with any HDRI or light
|
|
setup. This gives you full artistic control.
|
|
- **Nuke**: Use the PBR passes with Nuke's relighting nodes for
|
|
compositing-native workflows.
|
|
- **Unreal Engine**: Import as material textures for real-time PBR
|
|
rendering.
|
|
|
|
This approach is arguably more powerful than SwitchLight for
|
|
professional VFX work because you have complete control over the
|
|
lighting. The tradeoff is that it requires manual setup rather than
|
|
one-click processing.
|
|
|
|
|
|
## 6. Feature extraction and segmentation
|
|
|
|
**What Beeble uses**: DINOv2 via timm (feature extraction),
|
|
segmentation_models_pytorch (segmentation)
|
|
|
|
These are intermediate pipeline components in Beeble's architecture.
|
|
DINOv2 produces feature maps that feed into other models, and the
|
|
segmentation model likely handles scene parsing or material
|
|
classification.
|
|
|
|
Most users replicating Beeble's outputs will not need these directly.
|
|
StableNormal already uses DINOv2 features internally, and CHORD
|
|
handles its own segmentation. If you do need them:
|
|
|
|
```bash
|
|
pip install timm segmentation-models-pytorch
|
|
```
|
|
```python
|
|
import timm
|
|
model = timm.create_model('vit_large_patch14_dinov2.lvd142m',
|
|
pretrained=True)
|
|
```
|
|
|
|
|
|
## Comparison with Beeble
|
|
|
|
| Pipeline stage | Beeble model | Open-source equivalent | Parity |
|
|
|-------------|-------------|----------------------|--------|
|
|
| Person detection | RT-DETR (open source) | RT-DETR / YOLOv8 | Identical (same model) |
|
|
| Face detection | Kornia face detection (open source) | Kornia / RetinaFace | Identical (same model) |
|
|
| Tracking | BoxMOT (open source) | BoxMOT / ByteTrack | Identical (same model) |
|
|
| Alpha matte | InSPyReNet (open source) | InSPyReNet / BiRefNet | Identical (same model) |
|
|
| Depth map | Depth Anything V2 (open source) | Depth Anything V2 | Identical (same model) |
|
|
| Edge detection | DexiNed (open source) | DexiNed | Identical (same model) |
|
|
| Normal map | SMP + timm backbone (proprietary weights) | StableNormal / NormalCrafter | Comparable or better |
|
|
| Base color | SMP + timm backbone (proprietary weights) | CHORD / RGB-X | Weaker for portraits |
|
|
| Roughness | SMP + timm backbone (proprietary weights) | CHORD / RGB-X | Weaker for portraits |
|
|
| Metallic | SMP + timm backbone (proprietary weights) | CHORD / RGB-X | Weaker for portraits |
|
|
| Specular | SMP + timm backbone (proprietary weights) | CHORD / RGB-X | Weaker for portraits |
|
|
| Super resolution | RRDB-Net (open source) | ESRGAN / Real-ESRGAN | Identical (same model) |
|
|
| Relighting | Proprietary (not fully characterized) | DiffusionRenderer / IC-Light / manual | Comparable (DiffusionRenderer) |
|
|
| Full inverse+forward rendering | Entire pipeline | DiffusionRenderer (NVIDIA, CVPR 2025) | Direct open-source competitor |
|
|
|
|
The "Beeble model" column reflects what was found in the application
|
|
binary, not what the CVPR paper describes. See
|
|
[REPORT.md](REPORT.md) section 4 for the full architecture analysis.
|
|
|
|
Where open-source matches or exceeds Beeble: alpha, depth, normals,
|
|
detection, tracking, edge detection, and super resolution. Every
|
|
preprocessing stage in Beeble's pipeline uses the same open-source
|
|
models you can use directly. For video normals, NormalCrafter
|
|
provides temporal consistency comparable to Beeble's pipeline.
|
|
|
|
Where Beeble retains an advantage: PBR material decomposition for
|
|
human subjects (base color, roughness, metallic, specular). While the
|
|
architecture appears to use standard open-source frameworks, the
|
|
model was trained on portrait-specific data. The open-source PBR
|
|
models were trained on material textures and interior scenes. However,
|
|
as discussed above, the barrier to creating equivalent training data
|
|
using synthetic rendering is lower than commonly assumed.
|
|
|
|
Where DiffusionRenderer changes the picture: NVIDIA's
|
|
DiffusionRenderer (CVPR 2025 Oral) handles both inverse rendering
|
|
(video → PBR maps) and forward rendering (PBR maps + lighting →
|
|
relit output) in a single open-source system. This is the first
|
|
open-source tool that directly replicates Beeble's entire core
|
|
pipeline, including relighting. It is backed by NVIDIA's resources,
|
|
uses Apache 2.0 licensing for code, and has a ComfyUI integration
|
|
available.
|
|
|
|
Where open-source wins on flexibility: manual relighting in
|
|
Blender/Nuke with the extracted PBR passes gives full artistic control
|
|
that Beeble's automated pipeline does not offer.
|
|
|
|
|
|
## What this means for Beeble users
|
|
|
|
If you primarily use Beeble for alpha mattes and depth maps, you can
|
|
replicate those results for free using the exact same models.
|
|
|
|
If you use Beeble for normal maps, the open-source alternatives are
|
|
now competitive and in some cases better, with NormalCrafter solving
|
|
the video temporal consistency problem.
|
|
|
|
If you use Beeble for full PBR decomposition of portrait footage and
|
|
need high-quality material properties, Beeble's model still has an
|
|
edge due to its portrait-specific training data. But the gap is
|
|
narrowing as models like CHORD improve.
|
|
|
|
If you use Beeble for one-click relighting, NVIDIA's
|
|
DiffusionRenderer is a direct open-source competitor that handles both
|
|
PBR decomposition and relighting in a single system. IC-Light provides
|
|
a diffusion-based alternative, and manual PBR relighting in
|
|
Blender/Nuke gives you full artistic control.
|
|
|
|
The core value proposition of Beeble Studio--beyond the models
|
|
themselves--is convenience. It packages everything into a single
|
|
application with a render queue, plugin integrations, and a polished
|
|
UX. Replicating the pipeline in ComfyUI requires more setup and
|
|
technical knowledge, but costs nothing and gives you full control
|
|
over every stage.
|