Version evolution (SL 1.0→2.0→3.0), team background, no patents, NVIDIA DiffusionRenderer as open-source competitor, dataset landscape (POLAR, SynthLight, etc.), botocore/AWS SDK in privacy app, MetaHuman EULA fix, user data controversy, and DiffusionRenderer ComfyUI integration across all docs.
917 lines
38 KiB
Markdown
917 lines
38 KiB
Markdown
# Beeble Studio: Technical Analysis
|
|
|
|
**Date**: January 2026
|
|
**Subject**: Beeble Studio desktop application (Linux x86_64 RPM)
|
|
**Scope**: Identification of third-party components and architectural
|
|
analysis of the application's AI pipeline
|
|
|
|
|
|
## 1. Introduction
|
|
|
|
Beeble Studio is a desktop application for VFX professionals that
|
|
generates physically-based rendering (PBR) passes from video footage.
|
|
It produces alpha mattes (background removal), depth maps, normal
|
|
maps, base color, roughness, specular, and metallic passes, along
|
|
with AI-driven relighting capabilities.
|
|
|
|
Beeble markets its pipeline as being "Powered by SwitchLight 3.0,"
|
|
their proprietary video-to-PBR model. The original SwitchLight
|
|
architecture was published at CVPR 2024 as a highlight paper (top
|
|
~10% of accepted papers), but that paper describes SwitchLight 1.0.
|
|
The product has since gone through at least two major rebuilds:
|
|
SwitchLight 2.0 (June 2025, described by Beeble as a "complete
|
|
architecture rebuild") and SwitchLight 3.0 (November 2025, marketed
|
|
as a "true video model"). The application is sold as a subscription
|
|
product, with plans starting at $42/month.
|
|
|
|
This analysis was prompted by observing that several of Beeble
|
|
Studio's output passes closely resemble the outputs of well-known
|
|
open-source models. Standard forensic techniques--string extraction from process memory,
|
|
TensorRT plugin analysis, PyInstaller module listing, Electron app
|
|
inspection, and manifest analysis--were used to determine which
|
|
components the application actually contains and how they are
|
|
organized.
|
|
|
|
|
|
## 2. Findings summary
|
|
|
|
The analysis identified four open-source models used directly for
|
|
user-facing outputs, a complete open-source detection and tracking
|
|
pipeline used for preprocessing, additional open-source architectural
|
|
components, and a proprietary model whose architecture raises questions
|
|
about how "proprietary" should be understood.
|
|
|
|
| Pipeline stage | Component | License | Open source |
|
|
|---------------|-----------|---------|-------------|
|
|
| Background removal (alpha) | transparent-background / InSPyReNet | MIT | Yes |
|
|
| Depth estimation | Depth Anything V2 via Kornia | Apache 2.0 | Yes |
|
|
| Person detection | RT-DETR via Kornia | Apache 2.0 | Yes |
|
|
| Face detection | Kornia face detection | Apache 2.0 | Yes |
|
|
| Multi-object tracking | BoxMOT via Kornia | MIT | Yes |
|
|
| Edge detection | DexiNed via Kornia | Apache 2.0 | Yes |
|
|
| Feature extraction | DINOv2 via timm | Apache 2.0 | Yes |
|
|
| Segmentation | segmentation_models_pytorch | MIT | Yes |
|
|
| Backbone architecture | PP-HGNet via timm | Apache 2.0 | Yes |
|
|
| Super resolution | RRDB-Net via Kornia | Apache 2.0 | Yes |
|
|
| PBR decomposition / relighting | SwitchLight 3.0 | Proprietary | See section 4 |
|
|
|
|
The preprocessing pipeline--background removal, depth estimation,
|
|
feature extraction, segmentation--is composed entirely of open-source
|
|
models used off the shelf.
|
|
|
|
The PBR decomposition and relighting stage is marketed as
|
|
"SwitchLight 3.0." The CVPR 2024 paper describes it as a
|
|
physics-based inverse rendering system with dedicated sub-networks
|
|
(Normal Net, Specular Net) and a Cook-Torrance reflectance model.
|
|
However, the application binary contains no references to any of
|
|
this physics-based terminology, and the architectural evidence
|
|
suggests the models are built from standard encoder-decoder
|
|
segmentation frameworks with pretrained backbones from timm.
|
|
This is discussed in detail in section 4.
|
|
|
|
The reconstructed pipeline architecture:
|
|
|
|
```
|
|
Input Video Frame
|
|
|
|
|
+--[RT-DETR + PP-HGNet]-----------> Person Detection
|
|
| |
|
|
| +--[BoxMOT]---------------> Tracking (multi-frame)
|
|
|
|
|
+--[Face Detection]---------------> Face Regions
|
|
|
|
|
+--[InSPyReNet]-------------------> Alpha Matte
|
|
|
|
|
+--[Depth Anything V2]------------> Depth Map
|
|
|
|
|
+--[DINOv2]-------> Feature Maps
|
|
| |
|
|
+--[segmentation_models_pytorch]---> Segmentation
|
|
|
|
|
+--[DexiNed]----------------------> Edge Maps
|
|
|
|
|
+--[SMP encoder-decoder + PP-HGNet/ResNet backbone]
|
|
| |
|
|
| +----> Normal Map
|
|
| +----> Base Color
|
|
| +----> Roughness
|
|
| +----> Specular
|
|
| +----> Metallic
|
|
|
|
|
+--[RRDB-Net]---------------------> Super Resolution
|
|
|
|
|
+--[Relighting model]-------------> Relit Output
|
|
```
|
|
|
|
Each stage runs independently. The Electron app passes separate
|
|
CLI flags (`--run-alpha`, `--run-depth`, `--run-pbr`) to the engine
|
|
binary, and each flag can be used in isolation. This is not a unified
|
|
end-to-end model--it is a pipeline of independent models. The
|
|
detection and tracking stages (RT-DETR, BoxMOT, face detection) serve
|
|
as preprocessing--locating and tracking subjects across frames before
|
|
the extraction models run.
|
|
|
|
|
|
## 3. Evidence for each component
|
|
|
|
### 3.1 Background removal: transparent-background / InSPyReNet
|
|
|
|
The complete API docstring for the `transparent-background` Python
|
|
package was found verbatim in process memory:
|
|
|
|
```
|
|
Args:
|
|
img (PIL.Image or np.ndarray): input image
|
|
type (str): output type option as below.
|
|
'rgba' will generate RGBA output regarding saliency score
|
|
as an alpha map.
|
|
'green' will change the background with green screen.
|
|
'white' will change the background with white color.
|
|
'[255, 0, 0]' will change the background with color
|
|
code [255, 0, 0].
|
|
'blur' will blur the background.
|
|
'overlay' will cover the salient object with translucent
|
|
green color, and highlight the edges.
|
|
Returns:
|
|
PIL.Image: output image
|
|
```
|
|
|
|
This is a character-for-character match with the docstring published
|
|
at https://github.com/plemeri/transparent-background.
|
|
|
|
Additionally, TensorRT layer names found in the binary correspond to
|
|
Res2Net bottleneck blocks (`RnRes2Br1Br2c_TRT`, `RnRes2Br2bBr2c_TRT`,
|
|
`RnRes2FullFusion_TRT`), which is the backbone architecture used by
|
|
InSPyReNet. The `transparent_background.backbones.SwinTransformer`
|
|
module path was also found in the PyInstaller bundle's module list.
|
|
|
|
- **Library**: transparent-background (`pip install transparent-background`)
|
|
- **Model**: InSPyReNet (Kim et al., ACCV 2022)
|
|
- **License**: MIT
|
|
- **Paper**: https://arxiv.org/abs/2209.09475
|
|
- **Repository**: https://github.com/plemeri/transparent-background
|
|
|
|
|
|
### 3.2 Depth estimation: Depth Anything V2
|
|
|
|
The complete API documentation for Depth Anything V2's ONNX export
|
|
interface was found in process memory:
|
|
|
|
```
|
|
Export a DepthAnything model to an ONNX model file.
|
|
|
|
Args:
|
|
model_name: The name of the model to be loaded.
|
|
Valid model names include:
|
|
- `depth-anything-v2-small`
|
|
- `depth-anything-v2-base`
|
|
- `depth-anything-v2-large`
|
|
model_type:
|
|
The type of the model to be loaded.
|
|
Valid model types include:
|
|
- `model`
|
|
- `model_bnb4`
|
|
- `model_fp16`
|
|
- `model_int8`
|
|
```
|
|
|
|
This is accessed through Kornia's ONNX builder interface
|
|
(`kornia.onnx.DepthAnythingONNXBuilder`), with 50+ additional
|
|
references to Kornia's tutorials and modules throughout the binary.
|
|
|
|
- **Library**: Kornia (`pip install kornia`)
|
|
- **Model**: Depth Anything V2 (Yang et al., 2024)
|
|
- **License**: Apache 2.0
|
|
- **Paper**: https://arxiv.org/abs/2406.09414
|
|
- **Repository**: https://github.com/kornia/kornia
|
|
|
|
|
|
### 3.3 Feature extraction: DINOv2
|
|
|
|
Multiple references to DINOv2 were found across the application:
|
|
|
|
- Runtime warning: `WARNING:dinov2:xFormers not available` (captured
|
|
from application output during normal operation)
|
|
- Model checkpoint URLs: `dinov2_vits14_pretrain.pth`,
|
|
`dinov2_vitb14_pretrain.pth` (Meta's public model hosting)
|
|
- timm model registry name: `vit_large_patch14_dinov2.lvd142m`
|
|
- File path: `/mnt/work/Beeble_Models/lib/timm/models/hrnet.py`
|
|
(timm library bundled in the application)
|
|
|
|
DINOv2 is Meta's self-supervised vision transformer. It does not
|
|
produce a user-facing output directly--it generates feature maps
|
|
that feed into downstream models. This is a standard pattern in
|
|
modern computer vision: use a large pretrained backbone for feature
|
|
extraction, then train smaller task-specific heads on top.
|
|
|
|
- **Library**: timm (`pip install timm`)
|
|
- **Model**: DINOv2 (Oquab et al., Meta AI, 2023)
|
|
- **License**: Apache 2.0
|
|
- **Paper**: https://arxiv.org/abs/2304.07193
|
|
- **Repository**: https://github.com/huggingface/pytorch-image-models
|
|
|
|
|
|
### 3.4 Segmentation: segmentation_models_pytorch
|
|
|
|
A direct reference to the library's GitHub repository, encoder/decoder
|
|
architecture parameters, and decoder documentation was found in
|
|
process memory:
|
|
|
|
```
|
|
encoder_name: Name of the encoder to use.
|
|
encoder_depth: Depth of the encoder.
|
|
decoder_channels: Number of channels in the decoder.
|
|
decoder_name: What decoder to use.
|
|
|
|
https://github.com/qubvel-org/segmentation_models.pytorch/
|
|
tree/main/segmentation_models_pytorch/decoders
|
|
|
|
Note:
|
|
Only encoder weights are available.
|
|
Pretrained weights for the whole model are not available.
|
|
```
|
|
|
|
This library is a framework for building encoder-decoder
|
|
segmentation models. It is not a model itself--it provides the
|
|
architecture (UNet, FPN, DeepLabV3, etc.) into which you plug a
|
|
pretrained encoder backbone (ResNet, EfficientNet, etc.) and train
|
|
the decoder on your own data for your specific task.
|
|
|
|
Its presence alongside the pretrained backbones described below
|
|
suggests it serves as the architectural foundation for one or more
|
|
of the PBR output models. This is discussed further in section 4.
|
|
|
|
- **Library**: segmentation_models_pytorch
|
|
(`pip install segmentation-models-pytorch`)
|
|
- **License**: MIT
|
|
- **Repository**: https://github.com/qubvel-org/segmentation_models.pytorch
|
|
|
|
|
|
### 3.5 Backbone: PP-HGNet
|
|
|
|
The `HighPerfGpuNet` class was found in process memory along with its
|
|
full structure:
|
|
|
|
```
|
|
HighPerfGpuNet
|
|
HighPerfGpuNet.forward_features
|
|
HighPerfGpuNet.reset_classifier
|
|
HighPerfGpuBlock.__init__
|
|
LearnableAffineBlock
|
|
ConvBNAct.__init__
|
|
StemV1.__init__
|
|
StemV1.forward
|
|
_create_hgnetr
|
|
```
|
|
|
|
This is PP-HGNet (PaddlePaddle High Performance GPU Network), ported
|
|
to timm's model registry. Documentation strings confirm the identity:
|
|
|
|
```
|
|
PP-HGNet (V1 & V2)
|
|
PP-HGNetv2: https://github.com/PaddlePaddle/PaddleClas/
|
|
.../pp_hgnet_v2.py
|
|
```
|
|
|
|
PP-HGNet is a convolutional backbone architecture designed for fast
|
|
GPU inference, originally developed for Baidu's RT-DETR real-time
|
|
object detection system. It is available as a pretrained backbone
|
|
through timm and is commonly used as an encoder in larger models.
|
|
|
|
PP-HGNet serves a dual role in the Beeble pipeline. First, it
|
|
functions as the backbone encoder for the RT-DETR person detection
|
|
model (see section 3.6). Second, based on the co-presence of
|
|
`segmentation_models_pytorch` and compatible encoder interfaces, it
|
|
likely serves as one of the backbone encoders for the PBR
|
|
decomposition models. This dual use is standard--the same pretrained
|
|
backbone can be loaded into different model architectures for different
|
|
tasks.
|
|
|
|
- **Library**: timm (`pip install timm`)
|
|
- **Model**: PP-HGNet (Baidu/PaddlePaddle)
|
|
- **License**: Apache 2.0
|
|
- **Repository**: https://github.com/huggingface/pytorch-image-models
|
|
|
|
|
|
### 3.6 Detection and tracking pipeline
|
|
|
|
The binary contains a complete person detection and tracking pipeline
|
|
built from open-source models accessed through Kornia.
|
|
|
|
**RT-DETR (Real-Time Detection Transformer).** Full module paths for
|
|
RT-DETR were found in the binary:
|
|
|
|
```
|
|
kornia.contrib.models.rt_detr.architecture.hgnetv2
|
|
kornia.contrib.models.rt_detr.architecture.resnet_d
|
|
kornia.contrib.models.rt_detr.architecture.rtdetr_head
|
|
kornia.contrib.models.rt_detr.architecture.hybrid_encoder
|
|
kornia.models.detection.rtdetr
|
|
```
|
|
|
|
RT-DETR model configuration strings confirm the PP-HGNet connection:
|
|
|
|
```
|
|
Configuration to construct RT-DETR model.
|
|
- HGNetV2-L: 'hgnetv2_l' or RTDETRModelType.hgnetv2_l
|
|
- HGNetV2-X: 'hgnetv2_x' or RTDETRModelType.hgnetv2_x
|
|
```
|
|
|
|
RT-DETR is Baidu's real-time object detection model, published at
|
|
ICLR 2024. It detects and localizes objects (including persons) in
|
|
images. In Beeble's pipeline, it likely serves as the initial stage
|
|
that identifies which regions of the frame contain subjects to process.
|
|
|
|
- **Model**: RT-DETR (Zhao et al., 2024)
|
|
- **License**: Apache 2.0 (via Kornia)
|
|
- **Paper**: https://arxiv.org/abs/2304.08069
|
|
|
|
**Face detection.** The `kornia.contrib.face_detection` module and
|
|
`kornia.contrib.FaceDetectorResult` class were found in the binary.
|
|
This provides face region detection, likely used to guide the PBR
|
|
models in handling facial features (skin, eyes, hair) differently
|
|
from other body parts or clothing.
|
|
|
|
**BoxMOT (multi-object tracking).** The module path
|
|
`kornia.models.tracking.boxmot_tracker` was found in the binary.
|
|
BoxMOT is a multi-object tracking library that maintains identity
|
|
across video frames--given detections from RT-DETR on each frame,
|
|
BoxMOT tracks which detection corresponds to which person over time.
|
|
|
|
- **Repository**: https://github.com/mikel-brostrom/boxmot
|
|
- **License**: MIT (AGPL-3.0 for some trackers, MIT for others)
|
|
|
|
The presence of a full detection-tracking pipeline is notable because
|
|
it means the video processing is not a single model operating on raw
|
|
frames. The pipeline first detects and tracks persons, then runs
|
|
the extraction models on the detected regions. This is a standard
|
|
computer vision approach, and every component in this preprocessing
|
|
chain is open-source.
|
|
|
|
|
|
### 3.7 Edge detection and super resolution
|
|
|
|
Two additional open-source models were found:
|
|
|
|
**DexiNed (edge detection).** The module path
|
|
`kornia.models.edge_detection.dexined` was found in the binary.
|
|
DexiNed (Dense Extreme Inception Network for Edge Detection) is
|
|
a CNN-based edge detector. It likely produces edge maps used as
|
|
auxiliary input or guidance for other models in the pipeline.
|
|
|
|
- **Model**: DexiNed (Soria et al., 2020)
|
|
- **License**: Apache 2.0 (via Kornia)
|
|
|
|
**RRDB-Net (super resolution).** The module path
|
|
`kornia.models.super_resolution.rrdbnet` was found in the binary.
|
|
RRDB-Net (Residual-in-Residual Dense Block Network) is the backbone
|
|
of ESRGAN, the widely-used super resolution model. This is likely
|
|
used to upscale PBR passes to the output resolution.
|
|
|
|
- **Model**: RRDB-Net / ESRGAN (Wang et al., 2018)
|
|
- **License**: Apache 2.0 (via Kornia)
|
|
|
|
|
|
### 3.8 TensorRT plugins and quantized backbones
|
|
|
|
Several custom TensorRT plugins were found compiled for inference:
|
|
|
|
- `DisentangledAttention_TRT` -- a custom TRT plugin implementing
|
|
DeBERTa-style disentangled attention (He et al., Microsoft, 2021).
|
|
The `_TRT` suffix indicates this is compiled for production
|
|
inference, not just a bundled library. This suggests a transformer
|
|
component in the pipeline that uses disentangled attention to
|
|
process both content and position information separately.
|
|
|
|
- `GridAnchorRect_TRT` -- anchor generation for object detection.
|
|
Combined with the RT-DETR and face detection references, this
|
|
confirms that the pipeline includes a detection stage.
|
|
|
|
Multiple backbone architectures were found with TensorRT INT8
|
|
quantization and stage-level fusion optimizations:
|
|
|
|
```
|
|
int8_resnet50_stage_1_4_fusion
|
|
int8_resnet50_stage_2_fusion
|
|
int8_resnet50_stage_3_fusion
|
|
int8_resnet34_stage_1_4_fusion
|
|
int8_resnet34_stage_2_fusion
|
|
int8_resnet34_stage_3_fusion
|
|
int8_resnext101_backbone_fusion
|
|
```
|
|
|
|
This shows that ResNet-34, ResNet-50, and ResNeXt-101 are compiled
|
|
for inference at INT8 precision with stage-level fusion optimizations.
|
|
These are standard pretrained backbones available from torchvision and
|
|
timm.
|
|
|
|
|
|
### 3.9 Additional libraries
|
|
|
|
The binary contains references to supporting libraries that are
|
|
standard in ML applications:
|
|
|
|
| Library | License | Role |
|
|
|---------|---------|------|
|
|
| PyTorch 2.8.0+cu128 | BSD 3-Clause | Core ML framework |
|
|
| TensorRT 10 | NVIDIA proprietary | Model compilation and inference |
|
|
| OpenCV 4.11.0.86 (with Qt5, FFmpeg) | Apache 2.0 | Image processing |
|
|
| timm 1.0.15 | Apache 2.0 | Model registry and backbones |
|
|
| Albumentations | MIT | Image augmentation |
|
|
| Pillow | MIT-CMU | Image I/O |
|
|
| HuggingFace Hub | Apache 2.0 | Model downloading |
|
|
| gdown | MIT | Google Drive file downloading |
|
|
| NumPy, SciPy | BSD | Numerical computation |
|
|
| Hydra / OmegaConf | MIT | ML configuration management |
|
|
| einops | MIT | Tensor manipulation |
|
|
| safetensors | Apache 2.0 | Model weight format |
|
|
| Flet | Apache 2.0 | Cross-platform GUI framework |
|
|
| SoftHSM2 / PKCS#11 | BSD 2-Clause | License token validation |
|
|
| OpenSSL 1.1 | Apache 2.0 | Cryptographic operations |
|
|
| botocore (AWS SDK) | Apache 2.0 | Cloud connectivity (1,823 service files) |
|
|
|
|
Three entries deserve mention. **botocore** is the AWS SDK core
|
|
library, bundled with 1,823 service definition files covering 400+
|
|
AWS services. For an application whose product page states "Your
|
|
files never leave your machine," the presence of the full AWS SDK
|
|
raises questions about what network connectivity the application
|
|
maintains. This analysis did not perform network monitoring to
|
|
determine what connections, if any, the application makes during
|
|
normal operation.
|
|
|
|
**Pyarmor** (runtime ID
|
|
`pyarmor_runtime_007423`) is used to encrypt all of Beeble's custom
|
|
Python code--every proprietary module is obfuscated with randomized
|
|
names and encrypted bytecode. This prevents static analysis of how
|
|
models are orchestrated. **Flet** is the GUI framework powering the
|
|
Python-side interface.
|
|
|
|
|
|
## 4. Architecture analysis
|
|
|
|
This section presents evidence about how the PBR decomposition model
|
|
is constructed. The findings here are more inferential than those in
|
|
section 3--they are based on the absence of expected evidence and the
|
|
presence of architectural patterns, rather than on verbatim string
|
|
matches. The distinction matters, and we draw it clearly.
|
|
|
|
### 4.1 What the CVPR 2024 paper describes
|
|
|
|
The SwitchLight CVPR 2024 paper describes a physics-based inverse
|
|
rendering architecture with several dedicated components:
|
|
|
|
- A **Normal Net** that estimates surface normals
|
|
- A **Specular Net** that predicts specular reflectance properties
|
|
- Analytical **albedo derivation** using a Cook-Torrance BRDF model
|
|
- A **Render Net** that performs the final relighting
|
|
- Spherical harmonics for environment lighting representation
|
|
|
|
This is presented as a unified system where intrinsic decomposition
|
|
(breaking an image into its physical components) is an intermediate
|
|
step in the relighting pipeline. The paper's novelty claim rests
|
|
partly on this physics-driven architecture.
|
|
|
|
An important caveat: the CVPR paper describes SwitchLight 1.0.
|
|
The shipped product is SwitchLight 3.0, which Beeble says went
|
|
through two major rebuilds. SwitchLight 2.0 (June 2025) was described
|
|
as a "complete architecture rebuild" that removed the alpha mask
|
|
requirement and extended from isolated humans to full scenes.
|
|
SwitchLight 3.0 (November 2025) was described as a "true video
|
|
model" with multi-frame processing, replacing the per-frame
|
|
architecture. The paper's physics-based architecture may not reflect
|
|
what is currently deployed. The binary analysis that follows applies
|
|
to the deployed product, not the CVPR paper.
|
|
|
|
### 4.2 What the binary contains
|
|
|
|
A thorough string search of the 2GB process memory dump and the 56MB
|
|
application binary found **zero** matches for the following terms:
|
|
|
|
- `cook_torrance`, `cook-torrance`, `Cook_Torrance`, `CookTorrance`
|
|
- `brdf`, `BRDF`
|
|
- `albedo`
|
|
- `specular_net`, `normal_net`, `render_net`
|
|
- `lightstage`, `light_stage`, `OLAT`
|
|
- `environment_map`, `env_map`, `spherical_harmonic`, `SH_coeff`
|
|
- `inverse_rendering`, `intrinsic_decomposition`
|
|
- `relight` (as a function or class name)
|
|
- `switchlight`, `SwitchLight` (in any capitalization)
|
|
|
|
Not one of these terms appears anywhere in the application.
|
|
|
|
The absence of "SwitchLight" deserves emphasis. This term was searched
|
|
across three independent codebases:
|
|
|
|
1. The `beeble-ai` engine binary (56 MB) -- zero matches
|
|
2. The `beeble-engine-setup` binary (13 MB) -- zero matches
|
|
3. All 667 JavaScript files in the Electron app's `dist/` directory --
|
|
zero matches
|
|
|
|
"SwitchLight" is purely a marketing name. It does not appear as a
|
|
model name, a class name, a configuration key, a log message, or a
|
|
comment anywhere in the application. By contrast, open-source
|
|
component names appear throughout the binary because they are real
|
|
software identifiers used by real code. "SwitchLight" is not used by
|
|
any code at all.
|
|
|
|
This is a significant absence. When an application uses a library or
|
|
implements an algorithm, its terminology appears in memory through
|
|
function names, variable names, error messages, logging, docstrings,
|
|
or class definitions. The open-source components (InSPyReNet, Depth
|
|
Anything, DINOv2, RT-DETR, BoxMOT) are all identifiable precisely
|
|
because their terminology is present. The physics-based rendering
|
|
vocabulary described in the CVPR paper is entirely absent.
|
|
|
|
There is a caveat: Beeble encrypts its custom Python code with
|
|
Pyarmor, which encrypts bytecode and obfuscates module names. If the
|
|
Cook-Torrance logic exists only in Pyarmor-encrypted modules, its
|
|
terminology would not be visible to string extraction. However,
|
|
TensorRT layer names, model checkpoint references, and library-level
|
|
strings survive Pyarmor encryption--and none of those contain
|
|
physics-based rendering terminology either.
|
|
|
|
### 4.3 What the binary contains instead
|
|
|
|
Where you would expect physics-based rendering components, the
|
|
binary shows standard machine learning infrastructure:
|
|
|
|
- **segmentation_models_pytorch** -- an encoder-decoder segmentation
|
|
framework designed for dense pixel prediction tasks. It provides
|
|
architectures (UNet, FPN, DeepLabV3) that take pretrained encoder
|
|
backbones and learn to predict pixel-level outputs.
|
|
|
|
- **PP-HGNet, ResNet-34, ResNet-50, ResNeXt-101** -- standard
|
|
pretrained backbone architectures, all available from timm. These
|
|
are the encoders that plug into segmentation_models_pytorch.
|
|
|
|
- **DINOv2** -- a self-supervised feature extractor that provides
|
|
rich visual features as input to downstream models.
|
|
|
|
- **DisentangledAttention** -- a transformer attention mechanism,
|
|
compiled as a custom TRT plugin for inference.
|
|
|
|
This is the standard toolkit for building dense prediction models
|
|
in computer vision. You pick an encoder backbone, connect it to a
|
|
segmentation decoder, and train the resulting model to predict
|
|
whatever pixel-level output you need--whether that is semantic labels,
|
|
depth values, or normal vectors.
|
|
|
|
### 4.4 What the Electron app reveals
|
|
|
|
The application's Electron shell (the UI layer that orchestrates the
|
|
Python engine) is not encrypted and provides clear evidence about
|
|
the pipeline structure.
|
|
|
|
The engine binary receives independent processing flags:
|
|
|
|
- `--run-alpha` -- generates alpha mattes
|
|
- `--run-depth` -- generates depth maps
|
|
- `--run-pbr` -- generates BaseColor, Normal, Roughness, Specular,
|
|
Metallic
|
|
|
|
Each flag can be used in isolation. A user can request alpha without
|
|
depth, or depth without PBR. The Electron app constructs these flags
|
|
independently based on user selections.
|
|
|
|
A session-start log entry captured in process memory confirms this
|
|
separation:
|
|
|
|
```json
|
|
{
|
|
"extra_command": "--run-pbr --run-alpha --run-depth
|
|
--save-exr --pbr-stride 1,2 --fps 24.0
|
|
--engine-version r1.3.0-m1.1.1"
|
|
}
|
|
```
|
|
|
|
The `--pbr-stride 1,2` flag is notable. It indicates that PBR passes
|
|
are not processed on every frame--they use a stride, processing a
|
|
subset of frames and presumably interpolating the rest. This
|
|
contradicts the "true end-to-end video model that understands motion
|
|
natively" claim on Beeble's research page. A model that truly
|
|
processes video end-to-end would not need to skip frames.
|
|
|
|
### 4.5 What this suggests
|
|
|
|
The evidence points to a specific conclusion: the PBR decomposition
|
|
model is most likely a standard encoder-decoder segmentation model
|
|
(segmentation_models_pytorch architecture) with pretrained backbones
|
|
(PP-HGNet, ResNet, DINOv2), trained on Beeble's private dataset to
|
|
predict PBR channels as its output.
|
|
|
|
This is a common and well-understood approach in computer vision.
|
|
You take a pretrained backbone, attach a decoder, and train the whole
|
|
model on your task-specific data using task-specific losses. The
|
|
Cook-Torrance reflectance model described in the CVPR paper would
|
|
then be a *training-time loss function*--used to compute the error
|
|
between predicted and ground-truth renders during training--rather
|
|
than an architectural component that exists at inference time.
|
|
|
|
This distinction matters because it changes what "Powered by
|
|
SwitchLight 3.0" actually means. The CVPR paper's framing suggests a
|
|
novel physics-driven architecture. The binary evidence suggests
|
|
standard open-source architectures trained with proprietary data. The
|
|
genuine proprietary elements are the training methodology, the
|
|
lightstage training data, and the trained weights--not the model
|
|
architecture itself.
|
|
|
|
We want to be clear about the limits of this inference. The Pyarmor
|
|
encryption prevents us from seeing the actual pipeline code, and the
|
|
TensorRT engines inside the encrypted `.enc` model files do not
|
|
expose their internal layer structure through string extraction. It is
|
|
possible, though we think unlikely, that the physics-based rendering
|
|
code exists entirely within the encrypted layers and uses no standard
|
|
terminology. We present this analysis as our best reading of the
|
|
available evidence, not as a certainty.
|
|
|
|
|
|
## 5. Code protection
|
|
|
|
Beeble uses two layers of protection to obscure its pipeline:
|
|
|
|
**Model encryption.** The six model files are stored as `.enc`
|
|
files encrypted with AES. They total 4.3 GB:
|
|
|
|
| File | Size |
|
|
|------|------|
|
|
| 97b0085560.enc | 1,877 MB |
|
|
| b001322340.enc | 1,877 MB |
|
|
| 6edccd5753.enc | 351 MB |
|
|
| e710b0c669.enc | 135 MB |
|
|
| 0d407dcf32.enc | 111 MB |
|
|
| 7f121ea5bc.enc | 49 MB |
|
|
|
|
The filenames are derived from their SHA-256 hashes. No metadata
|
|
in the manifest indicates what each model does. However, comparing
|
|
file sizes against known open-source model checkpoints is suggestive:
|
|
|
|
- The 351 MB file closely matches the size of a DINOv2 ViT-B
|
|
checkpoint (~346 MB for `dinov2_vitb14_pretrain.pth`)
|
|
- The two ~1,877 MB files are nearly identical in size (within 1 MB
|
|
of each other), suggesting two variants of the same model compiled
|
|
to TensorRT engines--possibly different precision levels or input
|
|
resolution configurations
|
|
- The smaller files (49 MB, 111 MB, 135 MB) are consistent with
|
|
single-task encoder-decoder models compiled to TensorRT with INT8
|
|
quantization
|
|
|
|
**Code obfuscation.** All custom Python code is encrypted with
|
|
Pyarmor. Module names are randomized (`q47ne3pa`, `qf1hf17m`,
|
|
`vk3zuv58`) and bytecode is decrypted only at runtime. The
|
|
application contains approximately 82 obfuscated modules across
|
|
three main packages, with the largest single module being 108 KB.
|
|
|
|
This level of protection is unusual for a desktop application in
|
|
the VFX space, and it is worth understanding what it does and does
|
|
not hide. Pyarmor prevents reading the pipeline orchestration
|
|
code--how models are loaded, connected, and run. But it does not
|
|
hide which libraries are loaded into memory, which TensorRT plugins
|
|
are compiled, or what command-line interface the engine exposes.
|
|
Those are the evidence sources this analysis relies on.
|
|
|
|
|
|
## 6. Beeble's public claims
|
|
|
|
Beeble's marketing consistently attributes the entire Video-to-VFX
|
|
pipeline to SwitchLight. The following are exact quotes from their
|
|
public pages (see [evidence/marketing_claims.md](../evidence/marketing_claims.md)
|
|
for the complete archive).
|
|
|
|
**Beeble Studio product page** (beeble.ai/beeble-studio):
|
|
|
|
> Powered by **SwitchLight 3.0**, convert images and videos into
|
|
> **full PBR passes with alpha and depth maps** for seamless
|
|
> relighting, background removal, and advanced compositing.
|
|
|
|
**SwitchLight 3.0 research page** (beeble.ai/research/switchlight-3-0-is-here):
|
|
|
|
> SwitchLight 3.0 is the best Video-to-PBR model in the world.
|
|
|
|
> SwitchLight 3.0 is a **true end-to-end video model** that
|
|
> understands motion natively.
|
|
|
|
**Documentation FAQ** (docs.beeble.ai/help/faq):
|
|
|
|
On the "What is Video-to-VFX?" question:
|
|
|
|
> **Video-to-VFX** uses our foundation model, **SwitchLight 3.0**,
|
|
> and SOTA AI models to convert your footage into VFX-ready assets.
|
|
|
|
On the "Is Beeble's AI trained responsibly?" question:
|
|
|
|
> When open-source models are included, we choose them
|
|
> carefully--only those with published research papers that disclose
|
|
> their training data and carry valid commercial-use licenses.
|
|
|
|
The FAQ is the only public place where Beeble acknowledges the use
|
|
of open-source models. The product page and research page present
|
|
the entire pipeline as "Powered by SwitchLight 3.0" without
|
|
distinguishing which output passes come from SwitchLight versus
|
|
third-party open-source models.
|
|
|
|
### Investor-facing claims
|
|
|
|
Beeble raised a $4.75M seed round in July 2024 at a reported $25M
|
|
valuation, led by Basis Set Ventures and Fika Ventures. At the time,
|
|
the company had approximately 7 employees. Press coverage of the
|
|
funding consistently uses language like "foundational model" and
|
|
"world-class foundational model in lighting" to describe
|
|
SwitchLight--language that implies a novel, proprietary system rather
|
|
than a pipeline of open-source components with proprietary weights.
|
|
|
|
These investor-facing claims were made through public press releases
|
|
and coverage, not private communications. They are relevant because
|
|
they represent how Beeble chose to characterize its technology to
|
|
the market. See [evidence/marketing_claims.md](../evidence/marketing_claims.md)
|
|
for archived quotes.
|
|
|
|
The "true end-to-end video model" claim is particularly difficult
|
|
to reconcile with the evidence. The application processes alpha,
|
|
depth, and PBR as independent stages using separate CLI flags.
|
|
PBR processing uses a frame stride (`--pbr-stride 1,2`), skipping
|
|
frames rather than processing video natively. This is a pipeline
|
|
of separate models, not an end-to-end video model.
|
|
|
|
|
|
## 7. What Beeble does well
|
|
|
|
This analysis would be incomplete without acknowledging what is
|
|
genuinely Beeble's own work.
|
|
|
|
**SwitchLight is published research.** The CVPR 2024 paper describes
|
|
a real methodology for training intrinsic decomposition models using
|
|
lightstage data and physics-based losses. Whether the deployed
|
|
architecture matches the paper's description is a separate question
|
|
from whether the research itself has merit. It does.
|
|
|
|
**The trained weights are real work.** If the PBR model is built on
|
|
standard architectures (as the evidence suggests), the value lies in
|
|
the training data and training process. Acquiring lightstage data,
|
|
designing loss functions, and iterating on model quality is
|
|
substantial work. Pretrained model weights trained on high-quality
|
|
domain-specific data are genuinely valuable, even when the
|
|
architecture is standard.
|
|
|
|
**TensorRT compilation is non-trivial engineering.** Converting
|
|
PyTorch models to TensorRT engines with INT8 quantization for
|
|
real-time inference requires expertise. The application runs at
|
|
interactive speeds on consumer GPUs with 11 GB+ VRAM.
|
|
|
|
**The product is a real product.** The desktop application, Nuke/
|
|
Blender/Unreal integrations, cloud API, render queue, EXR output
|
|
with ACEScg color space support, and overall UX represent
|
|
substantial product engineering.
|
|
|
|
|
|
## 8. The real question
|
|
|
|
Most Beeble Studio users use the application for PBR extractions:
|
|
alpha mattes, diffuse/albedo, normals, and depth maps. The
|
|
relighting features exist but are secondary to the extraction
|
|
workflow for much of the user base.
|
|
|
|
The alpha and depth extractions are produced by open-source models
|
|
used off the shelf. They can be replicated for free using the exact
|
|
same libraries.
|
|
|
|
The PBR extractions (normal, base color, roughness, specular,
|
|
metallic) use models whose trained weights are proprietary, but
|
|
whose architecture appears to be built from the same open-source
|
|
frameworks available to anyone. Open-source alternatives for PBR
|
|
decomposition now exist (CHORD from Ubisoft, RGB-X from Adobe) and
|
|
are narrowing the quality gap, though they were trained on different
|
|
data and may perform differently on portrait subjects.
|
|
|
|
See [COMFYUI_GUIDE.md](COMFYUI_GUIDE.md) for a detailed guide on
|
|
replicating each stage of the pipeline with open-source tools.
|
|
|
|
There is a common assumption that the training data represents a
|
|
significant barrier to replication--that lightstage captures are
|
|
expensive and rare, and therefore the trained weights are uniquely
|
|
valuable. As of late 2025, this assumption is increasingly difficult
|
|
to sustain.
|
|
|
|
Multiple public datasets now provide the kind of paired image +
|
|
ground-truth PBR data needed for training:
|
|
|
|
- **POLAR** (December 2025): 220 subjects, 156 light directions, 32
|
|
views, 4K resolution, 28.8 million images. This is comparable in
|
|
scale to the 287 subjects cited in Beeble's CVPR paper.
|
|
- **HumanOLAT** (ICCV 2025): the first public full-body lightstage
|
|
dataset, 21 subjects, 331 OLAT lighting conditions.
|
|
- **OpenHumanBRDF** (July 2025): 147 human models with full PBR
|
|
properties (diffuse, specular, SSS) in Blender.
|
|
- **MatSynth** (CVPR 2024): 433 GB of CC0/CC-BY PBR material maps,
|
|
used to train Ubisoft's CHORD model.
|
|
|
|
Published results further undermine the lightstage data moat.
|
|
**SynthLight** (CVPR 2025) trained purely on ~350 synthetic Blender
|
|
heads and matched the quality of lightstage-trained methods. **NVIDIA
|
|
Lumos** (SIGGRAPH Asia 2022) matched state-of-the-art with 300,000
|
|
synthetic samples. **DiFaReli++** outperformed lightstage baselines
|
|
using only 2D internet images.
|
|
|
|
The cost estimates are modest. Ubisoft's CHORD model was trained in
|
|
5.2 days on a single H100 GPU (~$260-500 in cloud compute). A full
|
|
replication effort--synthetic dataset generation plus model training
|
|
--has been estimated at $4,500-$18,000, a fraction of Beeble's $4.75M
|
|
seed round.
|
|
|
|
Note: Unreal Engine MetaHumans, while visually excellent, cannot
|
|
legally be used for AI training. Epic's MetaHuman EULA explicitly
|
|
prohibits "using the Licensed Technology as a training input...into
|
|
any Generative AI Program." Blender with the MPFB2 plugin is a
|
|
viable alternative for synthetic data generation without license
|
|
restrictions.
|
|
|
|
The competitive landscape shifted significantly in 2025. NVIDIA's
|
|
**DiffusionRenderer** (CVPR 2025 Oral--the highest honor) performs
|
|
both inverse rendering (video → PBR maps) and forward rendering (PBR
|
|
maps + lighting → relit output) using video diffusion models. It is
|
|
open source (Apache 2.0 code, NVIDIA Open Model License for weights)
|
|
and has a ComfyUI integration. This is the first open-source system
|
|
that directly replicates Beeble's entire core pipeline, including
|
|
relighting, backed by NVIDIA's resources. See
|
|
[COMFYUI_GUIDE.md](COMFYUI_GUIDE.md) for integration details.
|
|
|
|
No patent applications were found for Beeble or its founders related
|
|
to SwitchLight, relighting, or inverse rendering (searched USPTO and
|
|
Google Patents, January 2026; note the 18-month publication delay for
|
|
recent filings). The CVPR 2024 paper has no associated code release.
|
|
Together with the architecture findings in section 4, this suggests
|
|
limited defensibility against open-source replication.
|
|
|
|
None of this means Beeble has no value. Convenience, polish, and
|
|
integration are real things people pay for. But the gap between
|
|
what the marketing says ("Powered by SwitchLight 3.0") and what
|
|
the application contains (a pipeline of mostly open-source
|
|
components, some used directly and others used as architectural
|
|
building blocks) is wider than what users would reasonably expect.
|
|
And the technical moat may be thinner than investors were led to
|
|
believe.
|
|
|
|
|
|
## 9. License compliance
|
|
|
|
All identified open-source components require attribution in
|
|
redistributed software. Both the MIT License and Apache 2.0 License
|
|
require that copyright notices and license texts be included with
|
|
any distribution of the software.
|
|
|
|
No such attribution was found in Beeble Studio's application,
|
|
documentation, or user-facing materials.
|
|
|
|
The scope of the issue extends beyond the core models. The
|
|
application bundles approximately 48 Python packages in its `lib/`
|
|
directory. Of these, only 6 include LICENSE files (cryptography,
|
|
gdown, MarkupSafe, numpy, openexr, triton). The remaining 42
|
|
packages--including PyTorch, Kornia, Pillow, and others with
|
|
attribution requirements--have no license files in the distribution.
|
|
|
|
For a detailed analysis of each license's requirements and what
|
|
compliance would look like, see
|
|
[LICENSE_ANALYSIS.md](LICENSE_ANALYSIS.md).
|
|
|
|
|
|
## 10. Conclusion
|
|
|
|
Beeble Studio's Video-to-VFX pipeline is a collection of independent
|
|
models, most built from open-source components.
|
|
|
|
The preprocessing stages are entirely open-source: background removal
|
|
(InSPyReNet), depth estimation (Depth Anything V2), person detection
|
|
(RT-DETR with PP-HGNet), face detection (Kornia), multi-object
|
|
tracking (BoxMOT), edge detection (DexiNed), and super resolution
|
|
(RRDB-Net). The PBR decomposition models appear to be built on
|
|
open-source architectural frameworks (segmentation_models_pytorch,
|
|
timm backbones) with proprietary trained weights.
|
|
|
|
The name "SwitchLight" does not appear anywhere in the application--
|
|
not in the engine binary, not in the setup binary, not in the
|
|
Electron app's 667 JavaScript files. It is a marketing name that
|
|
refers to no identifiable software component.
|
|
|
|
The CVPR 2024 paper describes a physics-based inverse rendering
|
|
architecture for SwitchLight 1.0. The deployed product is SwitchLight
|
|
3.0, which went through at least two "complete architecture rebuilds."
|
|
The application contains no evidence of physics-based rendering code
|
|
at inference time. This could mean the physics (Cook-Torrance
|
|
rendering) was used during training as a loss function, that the
|
|
architecture was replaced during the rebuilds, or both.
|
|
|
|
Beeble's marketing attributes the entire pipeline to SwitchLight
|
|
3.0. The evidence shows that alpha mattes come from InSPyReNet, depth
|
|
maps come from Depth Anything V2, person detection comes from
|
|
RT-DETR, tracking comes from BoxMOT, and the PBR models are built on
|
|
segmentation_models_pytorch with PP-HGNet and ResNet backbones. The
|
|
"true end-to-end video model" claim is contradicted by the
|
|
independent processing flags and frame stride parameter observed in
|
|
the application.
|
|
|
|
Of the approximately 48 Python packages bundled with the application,
|
|
only 6 include license files. The core open-source models' licenses
|
|
require attribution that does not appear to be provided.
|
|
|
|
These findings can be independently verified using the methods
|
|
described in [VERIFICATION_GUIDE.md](VERIFICATION_GUIDE.md).
|