382 lines
16 KiB
Markdown
382 lines
16 KiB
Markdown
# Replicating Beeble's Pipeline with Open-Source Tools
|
|
|
|
Most Beeble Studio users pay for PBR extractions--alpha mattes, depth
|
|
maps, normal maps--rather than the relighting. The extraction pipeline
|
|
is built from open-source models, and the PBR decomposition and
|
|
relighting stages now have viable open-source alternatives too.
|
|
|
|
This guide documents how to replicate each stage using ComfyUI and
|
|
direct Python. No workflow JSON files are provided yet, but the
|
|
relevant nodes, models, and tradeoffs are documented below.
|
|
|
|
If you are unfamiliar with ComfyUI, see https://docs.comfy.org/
|
|
|
|
|
|
## Pipeline overview
|
|
|
|
```
|
|
Input frame
|
|
|
|
|
+--> Background removal --> Alpha matte
|
|
| (InSPyReNet / BiRefNet)
|
|
|
|
|
+--> Depth estimation --> Depth map
|
|
| (Depth Anything V2)
|
|
|
|
|
+--> Normal estimation --> Normal map
|
|
| (StableNormal / NormalCrafter)
|
|
|
|
|
+--> PBR decomposition --> Albedo, Roughness, Metallic
|
|
| (CHORD / RGB↔X)
|
|
|
|
|
+--> Relighting --> Relit output
|
|
(IC-Light / manual in Blender/Nuke)
|
|
```
|
|
|
|
The first two stages use the exact same models Beeble uses. The
|
|
remaining stages use different models that produce comparable outputs.
|
|
|
|
|
|
## 1. Background removal (Alpha matte)
|
|
|
|
**What Beeble uses**: transparent-background / InSPyReNet (MIT)
|
|
|
|
This is the simplest stage. Several ComfyUI nodes wrap the same
|
|
underlying model:
|
|
|
|
- **ComfyUI-InSPyReNet** -- wraps the same `transparent-background`
|
|
library Beeble uses. Install via ComfyUI Manager.
|
|
- **ComfyUI-BiRefNet** -- uses BiRefNet, a newer model that often
|
|
produces sharper edges around hair and fine detail.
|
|
- **ComfyUI-RMBG** -- BRIA's background removal model, another
|
|
strong alternative.
|
|
|
|
For video, connect an image sequence loader to the removal node and
|
|
export the alpha channel as a separate pass. These models process
|
|
per-frame, so there is no temporal consistency--but alpha mattes are
|
|
typically stable enough that this is not a problem.
|
|
|
|
**Direct Python**:
|
|
```bash
|
|
pip install transparent-background
|
|
```
|
|
```python
|
|
from transparent_background import Remover
|
|
remover = Remover()
|
|
alpha = remover.process(image, type='map')
|
|
```
|
|
|
|
|
|
## 2. Depth estimation
|
|
|
|
**What Beeble uses**: Depth Anything V2 via Kornia (Apache 2.0)
|
|
|
|
- **ComfyUI-DepthAnythingV2** -- dedicated nodes for all model sizes.
|
|
- **comfyui_controlnet_aux** -- includes Depth Anything V2 as a
|
|
preprocessor option.
|
|
|
|
Use the `large` variant for best quality. This is a per-frame model
|
|
with no temporal information, but monocular depth tends to be stable
|
|
across frames for most footage.
|
|
|
|
**Direct Python**:
|
|
```bash
|
|
pip install kornia
|
|
```
|
|
```python
|
|
from kornia.contrib import DepthAnything
|
|
model = DepthAnything.from_pretrained("depth-anything-v2-large")
|
|
depth = model(image_tensor)
|
|
```
|
|
|
|
|
|
## 3. Normal estimation
|
|
|
|
**What Beeble claims**: The CVPR 2024 paper describes a dedicated
|
|
"Normal Net" within SwitchLight. However, analysis of the deployed
|
|
application found no evidence of this specific architecture--the
|
|
PBR models appear to use standard encoder-decoder segmentation
|
|
frameworks with pretrained backbones (see [REPORT.md](REPORT.md)
|
|
section 4 for details).
|
|
|
|
Multiple open-source models now produce high-quality surface normals
|
|
from single images, and one handles video with temporal consistency.
|
|
|
|
### For single images
|
|
|
|
- **StableNormal** (SIGGRAPH Asia 2024) -- currently best benchmarks
|
|
for monocular normal estimation. Uses a two-stage coarse-to-fine
|
|
strategy with DINOv2 semantic features for guidance. A turbo variant
|
|
runs 10x faster with minimal quality loss.
|
|
GitHub: https://github.com/Stable-X/StableNormal
|
|
|
|
- **DSINE** (CVPR 2024) -- discriminative CNN-based approach. No
|
|
diffusion overhead, so it is fast. Competitive with StableNormal on
|
|
NYUv2 benchmarks. Good choice when inference speed matters.
|
|
GitHub: https://github.com/markkua/DSINE
|
|
|
|
- **GeoWizard** (ECCV 2024) -- jointly predicts depth AND normals
|
|
from a single image, which guarantees geometric consistency between
|
|
the two. Available in ComfyUI via ComfyUI-Geowizard.
|
|
GitHub: https://github.com/fuxiao0719/GeoWizard
|
|
|
|
### For video (temporally consistent normals)
|
|
|
|
- **NormalCrafter** (2025) -- this is the most relevant model for
|
|
replicating Beeble's video pipeline. It uses video diffusion priors
|
|
to produce temporally consistent normal maps across frames,
|
|
directly comparable to SwitchLight 3.0's "true video model" claim.
|
|
Has ComfyUI nodes via ComfyUI-NormalCrafterWrapper.
|
|
GitHub: https://github.com/AIWarper/ComfyUI-NormalCrafterWrapper
|
|
Paper: https://arxiv.org/abs/2504.11427
|
|
|
|
Key parameters for ComfyUI:
|
|
- `window_size`: number of frames processed together (default 14).
|
|
Larger = better temporal consistency, more VRAM.
|
|
- `time_step_size`: how far the window slides. Set smaller than
|
|
window_size for overlapping windows and smoother transitions.
|
|
|
|
**Assessment**: For static images, StableNormal likely matches or
|
|
exceeds Beeble's normal quality, since it is a specialized model
|
|
rather than one sub-network within a larger system. For video,
|
|
NormalCrafter addresses the temporal consistency problem that was
|
|
previously a key differentiator of Beeble's pipeline.
|
|
|
|
|
|
## 4. PBR material decomposition (Albedo, Roughness, Metallic)
|
|
|
|
**What Beeble claims**: The CVPR 2024 paper describes a "Specular Net"
|
|
and analytical albedo derivation using a Cook-Torrance reflectance
|
|
model. Analysis of the deployed application found no Cook-Torrance,
|
|
BRDF, or physics-based rendering terminology in the binary. The PBR
|
|
models appear to use standard segmentation architectures
|
|
(segmentation_models_pytorch with pretrained backbones) trained on
|
|
proprietary portrait data. See [REPORT.md](REPORT.md) section 4.
|
|
|
|
Regardless of how Beeble implements PBR decomposition, this is the
|
|
hardest stage to replicate with open-source tools. Beeble's model was
|
|
trained on portrait and human subject data. The open-source
|
|
alternatives were trained on different data, which affects quality
|
|
for human subjects.
|
|
|
|
### Available models
|
|
|
|
- **CHORD** (Ubisoft La Forge, SIGGRAPH Asia 2025) -- the most
|
|
complete open-source option. Decomposes a single image into base
|
|
color, normal, height, roughness, and metalness using chained
|
|
diffusion. Has official ComfyUI nodes from Ubisoft. Weights on
|
|
HuggingFace (`Ubisoft/ubisoft-laforge-chord`).
|
|
GitHub: https://github.com/ubisoft/ComfyUI-Chord
|
|
**License: Research-only (Ubisoft ML License)**
|
|
|
|
Limitation: trained on the MatSynth dataset (~5700 PBR materials),
|
|
which is texture/material focused. Results on human skin, hair, and
|
|
clothing will be plausible but not specifically optimized for
|
|
portrait data. The authors note metalness prediction is notably
|
|
difficult.
|
|
|
|
- **RGB↔X** (Adobe, SIGGRAPH 2024) -- decomposes into albedo,
|
|
roughness, metallicity, normals, AND estimates lighting. Trained on
|
|
interior scenes. Fully open-source code and weights.
|
|
GitHub: https://github.com/zheng95z/rgbx
|
|
Minimum 12GB VRAM recommended.
|
|
|
|
Limitation: trained on interior scene data, not portrait/human
|
|
data. The albedo estimation for rooms and furniture is strong; for
|
|
human subjects it is less well-characterized.
|
|
|
|
- **PBRify Remix** -- simpler model for generating PBR maps from
|
|
diffuse textures. Trained on CC0 data from ambientCG, so no license
|
|
concerns. Designed for game texture upscaling rather than
|
|
photographic decomposition.
|
|
GitHub: https://github.com/Kim2091/PBRify_Remix
|
|
|
|
### The honest gap
|
|
|
|
Beeble's PBR model was trained on portrait and human subject data
|
|
(likely lightstage captures, based on the CVPR paper). The
|
|
open-source alternatives were trained on material textures or interior
|
|
scenes. For portrait work, this means:
|
|
|
|
- Skin subsurface scattering properties will be better captured by
|
|
Beeble's model
|
|
- Hair specularity and anisotropy are hard for general-purpose models
|
|
- Clothing material properties (roughness, metallic) should be
|
|
comparable
|
|
|
|
For non-portrait subjects (products, environments, objects), the
|
|
open-source models may actually perform better since they were trained
|
|
on more diverse material data.
|
|
|
|
If your goal is manual relighting in Blender or Nuke rather than
|
|
automated AI relighting, "good enough" PBR passes are often
|
|
sufficient because you have artistic control over the final result.
|
|
|
|
### On training data and the "moat"
|
|
|
|
The CVPR paper frames lightstage training data as a significant
|
|
competitive advantage. This deserves scrutiny from VFX professionals.
|
|
|
|
For PBR decomposition training, what you actually need is a dataset
|
|
of images paired with ground-truth PBR maps--albedo, normal,
|
|
roughness, metallic. Physical lightstage captures are one way to
|
|
obtain this data, but modern synthetic rendering provides the same
|
|
thing more cheaply and at greater scale:
|
|
|
|
- **Unreal Engine MetaHumans**: photorealistic digital humans with
|
|
full PBR material definitions. Render them under varied lighting
|
|
and you have ground-truth PBR for each frame.
|
|
- **Blender character generators** (Human Generator, MB-Lab):
|
|
produce characters with known material properties that can be
|
|
rendered procedurally.
|
|
- **Houdini procedural pipelines**: can generate hundreds of
|
|
thousands of unique character/lighting/pose combinations
|
|
programmatically.
|
|
|
|
The ground truth is inherent in synthetic rendering: you created the
|
|
scene, so you already have the PBR maps. A VFX studio with a
|
|
standard character pipeline could generate a training dataset in a
|
|
week.
|
|
|
|
With model sizes under 2 GB (based on the encrypted model files in
|
|
Beeble's distribution) and standard encoder-decoder architectures,
|
|
the compute cost to train equivalent models from synthetic data is
|
|
modest--well within reach of independent researchers or small studios.
|
|
|
|
This does not mean Beeble's trained weights are worthless. But the
|
|
barrier to replication is lower than the marketing suggests,
|
|
especially given that the model architectures are standard
|
|
open-source frameworks.
|
|
|
|
|
|
## 5. Relighting
|
|
|
|
**What Beeble claims**: The CVPR paper describes a "Render Net" for
|
|
relighting. This is the least well-characterized stage in our
|
|
analysis--the relighting model's architecture could not be determined
|
|
from the available evidence.
|
|
|
|
### AI-based relighting
|
|
|
|
- **IC-Light** (ICLR 2025, by lllyasviel / ControlNet creator) --
|
|
the leading open-source relighting model. Two modes: text-conditioned
|
|
(describe the target lighting) and background-conditioned (provide a
|
|
background image whose lighting should be matched). Based on Stable
|
|
Diffusion.
|
|
GitHub: https://github.com/lllyasviel/IC-Light
|
|
|
|
IC-Light uses diffusion-based lighting transfer rather than
|
|
physics-based rendering. The results look different--less physically
|
|
precise but more flexible in terms of creative lighting scenarios.
|
|
|
|
Available in ComfyUI via multiple community node packages.
|
|
|
|
### Manual relighting with PBR passes
|
|
|
|
If you have normal, albedo, roughness, and metallic maps from steps
|
|
3-4, you can do relighting directly in any 3D application:
|
|
|
|
- **Blender**: Import the passes as textures on a plane, apply a
|
|
Principled BSDF shader, and light the scene with any HDRI or light
|
|
setup. This gives you full artistic control.
|
|
- **Nuke**: Use the PBR passes with Nuke's relighting nodes for
|
|
compositing-native workflows.
|
|
- **Unreal Engine**: Import as material textures for real-time PBR
|
|
rendering.
|
|
|
|
This approach is arguably more powerful than SwitchLight for
|
|
professional VFX work because you have complete control over the
|
|
lighting. The tradeoff is that it requires manual setup rather than
|
|
one-click processing.
|
|
|
|
|
|
## 6. Feature extraction and segmentation
|
|
|
|
**What Beeble uses**: DINOv2 via timm (feature extraction),
|
|
segmentation_models_pytorch (segmentation)
|
|
|
|
These are intermediate pipeline components in Beeble's architecture.
|
|
DINOv2 produces feature maps that feed into other models, and the
|
|
segmentation model likely handles scene parsing or material
|
|
classification.
|
|
|
|
Most users replicating Beeble's outputs will not need these directly.
|
|
StableNormal already uses DINOv2 features internally, and CHORD
|
|
handles its own segmentation. If you do need them:
|
|
|
|
```bash
|
|
pip install timm segmentation-models-pytorch
|
|
```
|
|
```python
|
|
import timm
|
|
model = timm.create_model('vit_large_patch14_dinov2.lvd142m',
|
|
pretrained=True)
|
|
```
|
|
|
|
|
|
## Comparison with Beeble
|
|
|
|
| Pipeline stage | Beeble model | Open-source equivalent | Parity |
|
|
|-------------|-------------|----------------------|--------|
|
|
| Person detection | RT-DETR (open source) | RT-DETR / YOLOv8 | Identical (same model) |
|
|
| Face detection | Kornia face detection (open source) | Kornia / RetinaFace | Identical (same model) |
|
|
| Tracking | BoxMOT (open source) | BoxMOT / ByteTrack | Identical (same model) |
|
|
| Alpha matte | InSPyReNet (open source) | InSPyReNet / BiRefNet | Identical (same model) |
|
|
| Depth map | Depth Anything V2 (open source) | Depth Anything V2 | Identical (same model) |
|
|
| Edge detection | DexiNed (open source) | DexiNed | Identical (same model) |
|
|
| Normal map | SMP + timm backbone (proprietary weights) | StableNormal / NormalCrafter | Comparable or better |
|
|
| Base color | SMP + timm backbone (proprietary weights) | CHORD / RGB-X | Weaker for portraits |
|
|
| Roughness | SMP + timm backbone (proprietary weights) | CHORD / RGB-X | Weaker for portraits |
|
|
| Metallic | SMP + timm backbone (proprietary weights) | CHORD / RGB-X | Weaker for portraits |
|
|
| Specular | SMP + timm backbone (proprietary weights) | CHORD / RGB-X | Weaker for portraits |
|
|
| Super resolution | RRDB-Net (open source) | ESRGAN / Real-ESRGAN | Identical (same model) |
|
|
| Relighting | Proprietary (not fully characterized) | IC-Light / manual | Different approach |
|
|
|
|
The "Beeble model" column reflects what was found in the application
|
|
binary, not what the CVPR paper describes. See
|
|
[REPORT.md](REPORT.md) section 4 for the full architecture analysis.
|
|
|
|
Where open-source matches or exceeds Beeble: alpha, depth, normals,
|
|
detection, tracking, edge detection, and super resolution. Every
|
|
preprocessing stage in Beeble's pipeline uses the same open-source
|
|
models you can use directly. For video normals, NormalCrafter
|
|
provides temporal consistency comparable to Beeble's pipeline.
|
|
|
|
Where Beeble retains an advantage: PBR material decomposition for
|
|
human subjects (base color, roughness, metallic, specular). While the
|
|
architecture appears to use standard open-source frameworks, the
|
|
model was trained on portrait-specific data. The open-source PBR
|
|
models were trained on material textures and interior scenes. However,
|
|
as discussed above, the barrier to creating equivalent training data
|
|
using synthetic rendering is lower than commonly assumed.
|
|
|
|
Where open-source wins on flexibility: manual relighting in
|
|
Blender/Nuke with the extracted PBR passes gives full artistic control
|
|
that Beeble's automated pipeline does not offer.
|
|
|
|
|
|
## What this means for Beeble users
|
|
|
|
If you primarily use Beeble for alpha mattes and depth maps, you can
|
|
replicate those results for free using the exact same models.
|
|
|
|
If you use Beeble for normal maps, the open-source alternatives are
|
|
now competitive and in some cases better, with NormalCrafter solving
|
|
the video temporal consistency problem.
|
|
|
|
If you use Beeble for full PBR decomposition of portrait footage and
|
|
need high-quality material properties, Beeble's model still has an
|
|
edge due to its portrait-specific training data. But the gap is
|
|
narrowing as models like CHORD improve.
|
|
|
|
If you use Beeble for one-click relighting, IC-Light provides a
|
|
different but functional alternative, and manual PBR relighting in
|
|
Blender/Nuke gives you more control.
|
|
|
|
The core value proposition of Beeble Studio--beyond the models
|
|
themselves--is convenience. It packages everything into a single
|
|
application with a render queue, plugin integrations, and a polished
|
|
UX. Replicating the pipeline in ComfyUI requires more setup and
|
|
technical knowledge, but costs nothing and gives you full control
|
|
over every stage.
|