# Beeble Studio: Technical Analysis

**Date**: January 2026
**Subject**: Beeble Studio desktop application (Linux x86_64 RPM)
**Scope**: Identification of third-party components and architectural
analysis of the application's AI pipeline


## 1. Introduction

Beeble Studio is a desktop application for VFX professionals that
generates physically-based rendering (PBR) passes from video footage.
It produces alpha mattes (background removal), depth maps, normal
maps, base color, roughness, specular, and metallic passes, along
with AI-driven relighting capabilities.

Beeble markets its pipeline as being "Powered by SwitchLight 3.0,"
their proprietary video-to-PBR model published at CVPR 2024. The
application is sold as a subscription product, with plans starting
at $42/month.

This analysis was prompted by observing that several of Beeble
Studio's output passes closely resemble the outputs of well-known
open-source models. Standard forensic techniques--string extraction from process memory,
TensorRT plugin analysis, PyInstaller module listing, Electron app
inspection, and manifest analysis--were used to determine which
components the application actually contains and how they are
organized.


## 2. Findings summary

The analysis identified four open-source models used directly for
user-facing outputs, a complete open-source detection and tracking
pipeline used for preprocessing, additional open-source architectural
components, and a proprietary model whose architecture raises questions
about how "proprietary" should be understood.

| Pipeline stage | Component | License | Open source |
|---------------|-----------|---------|-------------|
| Background removal (alpha) | transparent-background / InSPyReNet | MIT | Yes |
| Depth estimation | Depth Anything V2 via Kornia | Apache 2.0 | Yes |
| Person detection | RT-DETR via Kornia | Apache 2.0 | Yes |
| Face detection | Kornia face detection | Apache 2.0 | Yes |
| Multi-object tracking | BoxMOT via Kornia | MIT | Yes |
| Edge detection | DexiNed via Kornia | Apache 2.0 | Yes |
| Feature extraction | DINOv2 via timm | Apache 2.0 | Yes |
| Segmentation | segmentation_models_pytorch | MIT | Yes |
| Backbone architecture | PP-HGNet via timm | Apache 2.0 | Yes |
| Super resolution | RRDB-Net via Kornia | Apache 2.0 | Yes |
| PBR decomposition / relighting | SwitchLight 3.0 | Proprietary | See section 4 |

The preprocessing pipeline--background removal, depth estimation,
feature extraction, segmentation--is composed entirely of open-source
models used off the shelf.

The PBR decomposition and relighting stage is marketed as
"SwitchLight 3.0." The CVPR 2024 paper describes it as a
physics-based inverse rendering system with dedicated sub-networks
(Normal Net, Specular Net) and a Cook-Torrance reflectance model.
However, the application binary contains no references to any of
this physics-based terminology, and the architectural evidence
suggests the models are built from standard encoder-decoder
segmentation frameworks with pretrained backbones from timm.
This is discussed in detail in section 4.

The reconstructed pipeline architecture:

```
Input Video Frame
       |
       +--[RT-DETR + PP-HGNet]----------->  Person Detection
       |       |
       |       +--[BoxMOT]--------------->  Tracking (multi-frame)
       |
       +--[Face Detection]--------------->  Face Regions
       |
       +--[InSPyReNet]------------------->  Alpha Matte
       |
       +--[Depth Anything V2]------------>  Depth Map
       |
       +--[DINOv2]------->  Feature Maps
       |                        |
       +--[segmentation_models_pytorch]--->  Segmentation
       |
       +--[DexiNed]---------------------->  Edge Maps
       |
       +--[SMP encoder-decoder + PP-HGNet/ResNet backbone]
       |       |
       |       +---->  Normal Map
       |       +---->  Base Color
       |       +---->  Roughness
       |       +---->  Specular
       |       +---->  Metallic
       |
       +--[RRDB-Net]--------------------->  Super Resolution
       |
       +--[Relighting model]------------->  Relit Output
```

Each stage runs independently. The Electron app passes separate
CLI flags (`--run-alpha`, `--run-depth`, `--run-pbr`) to the engine
binary, and each flag can be used in isolation. This is not a unified
end-to-end model--it is a pipeline of independent models. The
detection and tracking stages (RT-DETR, BoxMOT, face detection) serve
as preprocessing--locating and tracking subjects across frames before
the extraction models run.


## 3. Evidence for each component

### 3.1 Background removal: transparent-background / InSPyReNet

The complete API docstring for the `transparent-background` Python
package was found verbatim in process memory:

```
Args:
    img (PIL.Image or np.ndarray): input image
    type (str): output type option as below.
        'rgba' will generate RGBA output regarding saliency score
            as an alpha map.
        'green' will change the background with green screen.
        'white' will change the background with white color.
        '[255, 0, 0]' will change the background with color
            code [255, 0, 0].
        'blur' will blur the background.
        'overlay' will cover the salient object with translucent
            green color, and highlight the edges.
Returns:
    PIL.Image: output image
```

This is a character-for-character match with the docstring published
at https://github.com/plemeri/transparent-background.

Additionally, TensorRT layer names found in the binary correspond to
Res2Net bottleneck blocks (`RnRes2Br1Br2c_TRT`, `RnRes2Br2bBr2c_TRT`,
`RnRes2FullFusion_TRT`), which is the backbone architecture used by
InSPyReNet. The `transparent_background.backbones.SwinTransformer`
module path was also found in the PyInstaller bundle's module list.

- **Library**: transparent-background (`pip install transparent-background`)
- **Model**: InSPyReNet (Kim et al., ACCV 2022)
- **License**: MIT
- **Paper**: https://arxiv.org/abs/2209.09475
- **Repository**: https://github.com/plemeri/transparent-background


### 3.2 Depth estimation: Depth Anything V2

The complete API documentation for Depth Anything V2's ONNX export
interface was found in process memory:

```
Export a DepthAnything model to an ONNX model file.

    Args:
        model_name: The name of the model to be loaded.
        Valid model names include:
            - `depth-anything-v2-small`
            - `depth-anything-v2-base`
            - `depth-anything-v2-large`
        model_type:
            The type of the model to be loaded.
            Valid model types include:
                - `model`
                - `model_bnb4`
                - `model_fp16`
                - `model_int8`
```

This is accessed through Kornia's ONNX builder interface
(`kornia.onnx.DepthAnythingONNXBuilder`), with 50+ additional
references to Kornia's tutorials and modules throughout the binary.

- **Library**: Kornia (`pip install kornia`)
- **Model**: Depth Anything V2 (Yang et al., 2024)
- **License**: Apache 2.0
- **Paper**: https://arxiv.org/abs/2406.09414
- **Repository**: https://github.com/kornia/kornia


### 3.3 Feature extraction: DINOv2

Multiple references to DINOv2 were found across the application:

- Runtime warning: `WARNING:dinov2:xFormers not available` (captured
  from application output during normal operation)
- Model checkpoint URLs: `dinov2_vits14_pretrain.pth`,
  `dinov2_vitb14_pretrain.pth` (Meta's public model hosting)
- timm model registry name: `vit_large_patch14_dinov2.lvd142m`
- File path: `/mnt/work/Beeble_Models/lib/timm/models/hrnet.py`
  (timm library bundled in the application)

DINOv2 is Meta's self-supervised vision transformer. It does not
produce a user-facing output directly--it generates feature maps
that feed into downstream models. This is a standard pattern in
modern computer vision: use a large pretrained backbone for feature
extraction, then train smaller task-specific heads on top.

- **Library**: timm (`pip install timm`)
- **Model**: DINOv2 (Oquab et al., Meta AI, 2023)
- **License**: Apache 2.0
- **Paper**: https://arxiv.org/abs/2304.07193
- **Repository**: https://github.com/huggingface/pytorch-image-models


### 3.4 Segmentation: segmentation_models_pytorch

A direct reference to the library's GitHub repository, encoder/decoder
architecture parameters, and decoder documentation was found in
process memory:

```
encoder_name: Name of the encoder to use.
encoder_depth: Depth of the encoder.
decoder_channels: Number of channels in the decoder.
decoder_name: What decoder to use.

https://github.com/qubvel-org/segmentation_models.pytorch/
    tree/main/segmentation_models_pytorch/decoders

Note:
    Only encoder weights are available.
    Pretrained weights for the whole model are not available.
```

This library is a framework for building encoder-decoder
segmentation models. It is not a model itself--it provides the
architecture (UNet, FPN, DeepLabV3, etc.) into which you plug a
pretrained encoder backbone (ResNet, EfficientNet, etc.) and train
the decoder on your own data for your specific task.

Its presence alongside the pretrained backbones described below
suggests it serves as the architectural foundation for one or more
of the PBR output models. This is discussed further in section 4.

- **Library**: segmentation_models_pytorch
  (`pip install segmentation-models-pytorch`)
- **License**: MIT
- **Repository**: https://github.com/qubvel-org/segmentation_models.pytorch


### 3.5 Backbone: PP-HGNet

The `HighPerfGpuNet` class was found in process memory along with its
full structure:

```
HighPerfGpuNet
HighPerfGpuNet.forward_features
HighPerfGpuNet.reset_classifier
HighPerfGpuBlock.__init__
LearnableAffineBlock
ConvBNAct.__init__
StemV1.__init__
StemV1.forward
_create_hgnetr
```

This is PP-HGNet (PaddlePaddle High Performance GPU Network), ported
to timm's model registry. Documentation strings confirm the identity:

```
PP-HGNet (V1 & V2)
PP-HGNetv2: https://github.com/PaddlePaddle/PaddleClas/
    .../pp_hgnet_v2.py
```

PP-HGNet is a convolutional backbone architecture designed for fast
GPU inference, originally developed for Baidu's RT-DETR real-time
object detection system. It is available as a pretrained backbone
through timm and is commonly used as an encoder in larger models.

PP-HGNet serves a dual role in the Beeble pipeline. First, it
functions as the backbone encoder for the RT-DETR person detection
model (see section 3.6). Second, based on the co-presence of
`segmentation_models_pytorch` and compatible encoder interfaces, it
likely serves as one of the backbone encoders for the PBR
decomposition models. This dual use is standard--the same pretrained
backbone can be loaded into different model architectures for different
tasks.

- **Library**: timm (`pip install timm`)
- **Model**: PP-HGNet (Baidu/PaddlePaddle)
- **License**: Apache 2.0
- **Repository**: https://github.com/huggingface/pytorch-image-models


### 3.6 Detection and tracking pipeline

The binary contains a complete person detection and tracking pipeline
built from open-source models accessed through Kornia.

**RT-DETR (Real-Time Detection Transformer).** Full module paths for
RT-DETR were found in the binary:

```
kornia.contrib.models.rt_detr.architecture.hgnetv2
kornia.contrib.models.rt_detr.architecture.resnet_d
kornia.contrib.models.rt_detr.architecture.rtdetr_head
kornia.contrib.models.rt_detr.architecture.hybrid_encoder
kornia.models.detection.rtdetr
```

RT-DETR model configuration strings confirm the PP-HGNet connection:

```
Configuration to construct RT-DETR model.
    - HGNetV2-L: 'hgnetv2_l' or RTDETRModelType.hgnetv2_l
    - HGNetV2-X: 'hgnetv2_x' or RTDETRModelType.hgnetv2_x
```

RT-DETR is Baidu's real-time object detection model, published at
ICLR 2024. It detects and localizes objects (including persons) in
images. In Beeble's pipeline, it likely serves as the initial stage
that identifies which regions of the frame contain subjects to process.

- **Model**: RT-DETR (Zhao et al., 2024)
- **License**: Apache 2.0 (via Kornia)
- **Paper**: https://arxiv.org/abs/2304.08069

**Face detection.** The `kornia.contrib.face_detection` module and
`kornia.contrib.FaceDetectorResult` class were found in the binary.
This provides face region detection, likely used to guide the PBR
models in handling facial features (skin, eyes, hair) differently
from other body parts or clothing.

**BoxMOT (multi-object tracking).** The module path
`kornia.models.tracking.boxmot_tracker` was found in the binary.
BoxMOT is a multi-object tracking library that maintains identity
across video frames--given detections from RT-DETR on each frame,
BoxMOT tracks which detection corresponds to which person over time.

- **Repository**: https://github.com/mikel-brostrom/boxmot
- **License**: MIT (AGPL-3.0 for some trackers, MIT for others)

The presence of a full detection-tracking pipeline is notable because
it means the video processing is not a single model operating on raw
frames. The pipeline first detects and tracks persons, then runs
the extraction models on the detected regions. This is a standard
computer vision approach, and every component in this preprocessing
chain is open-source.


### 3.7 Edge detection and super resolution

Two additional open-source models were found:

**DexiNed (edge detection).** The module path
`kornia.models.edge_detection.dexined` was found in the binary.
DexiNed (Dense Extreme Inception Network for Edge Detection) is
a CNN-based edge detector. It likely produces edge maps used as
auxiliary input or guidance for other models in the pipeline.

- **Model**: DexiNed (Soria et al., 2020)
- **License**: Apache 2.0 (via Kornia)

**RRDB-Net (super resolution).** The module path
`kornia.models.super_resolution.rrdbnet` was found in the binary.
RRDB-Net (Residual-in-Residual Dense Block Network) is the backbone
of ESRGAN, the widely-used super resolution model. This is likely
used to upscale PBR passes to the output resolution.

- **Model**: RRDB-Net / ESRGAN (Wang et al., 2018)
- **License**: Apache 2.0 (via Kornia)


### 3.8 TensorRT plugins and quantized backbones

Several custom TensorRT plugins were found compiled for inference:

- `DisentangledAttention_TRT` -- a custom TRT plugin implementing
  DeBERTa-style disentangled attention (He et al., Microsoft, 2021).
  The `_TRT` suffix indicates this is compiled for production
  inference, not just a bundled library. This suggests a transformer
  component in the pipeline that uses disentangled attention to
  process both content and position information separately.

- `GridAnchorRect_TRT` -- anchor generation for object detection.
  Combined with the RT-DETR and face detection references, this
  confirms that the pipeline includes a detection stage.

Multiple backbone architectures were found with TensorRT INT8
quantization and stage-level fusion optimizations:

```
int8_resnet50_stage_1_4_fusion
int8_resnet50_stage_2_fusion
int8_resnet50_stage_3_fusion
int8_resnet34_stage_1_4_fusion
int8_resnet34_stage_2_fusion
int8_resnet34_stage_3_fusion
int8_resnext101_backbone_fusion
```

This shows that ResNet-34, ResNet-50, and ResNeXt-101 are compiled
for inference at INT8 precision with stage-level fusion optimizations.
These are standard pretrained backbones available from torchvision and
timm.


### 3.9 Additional libraries

The binary contains references to supporting libraries that are
standard in ML applications:

| Library | License | Role |
|---------|---------|------|
| PyTorch 2.8.0+cu128 | BSD 3-Clause | Core ML framework |
| TensorRT 10 | NVIDIA proprietary | Model compilation and inference |
| OpenCV 4.11.0.86 (with Qt5, FFmpeg) | Apache 2.0 | Image processing |
| timm 1.0.15 | Apache 2.0 | Model registry and backbones |
| Albumentations | MIT | Image augmentation |
| Pillow | MIT-CMU | Image I/O |
| HuggingFace Hub | Apache 2.0 | Model downloading |
| gdown | MIT | Google Drive file downloading |
| NumPy, SciPy | BSD | Numerical computation |
| Hydra / OmegaConf | MIT | ML configuration management |
| einops | MIT | Tensor manipulation |
| safetensors | Apache 2.0 | Model weight format |
| Flet | Apache 2.0 | Cross-platform GUI framework |
| SoftHSM2 / PKCS#11 | BSD 2-Clause | License token validation |
| OpenSSL 1.1 | Apache 2.0 | Cryptographic operations |

Two entries deserve mention. **Pyarmor** (runtime ID
`pyarmor_runtime_007423`) is used to encrypt all of Beeble's custom
Python code--every proprietary module is obfuscated with randomized
names and encrypted bytecode. This prevents static analysis of how
models are orchestrated. **Flet** is the GUI framework powering the
Python-side interface.


## 4. Architecture analysis

This section presents evidence about how the PBR decomposition model
is constructed. The findings here are more inferential than those in
section 3--they are based on the absence of expected evidence and the
presence of architectural patterns, rather than on verbatim string
matches. The distinction matters, and we draw it clearly.

### 4.1 What the CVPR 2024 paper describes

The SwitchLight CVPR 2024 paper describes a physics-based inverse
rendering architecture with several dedicated components:

- A **Normal Net** that estimates surface normals
- A **Specular Net** that predicts specular reflectance properties
- Analytical **albedo derivation** using a Cook-Torrance BRDF model
- A **Render Net** that performs the final relighting
- Spherical harmonics for environment lighting representation

This is presented as a unified system where intrinsic decomposition
(breaking an image into its physical components) is an intermediate
step in the relighting pipeline. The paper's novelty claim rests
partly on this physics-driven architecture.

### 4.2 What the binary contains

A thorough string search of the 2GB process memory dump and the 56MB
application binary found **zero** matches for the following terms:

- `cook_torrance`, `cook-torrance`, `Cook_Torrance`, `CookTorrance`
- `brdf`, `BRDF`
- `albedo`
- `specular_net`, `normal_net`, `render_net`
- `lightstage`, `light_stage`, `OLAT`
- `environment_map`, `env_map`, `spherical_harmonic`, `SH_coeff`
- `inverse_rendering`, `intrinsic_decomposition`
- `relight` (as a function or class name)
- `switchlight`, `SwitchLight` (in any capitalization)

Not one of these terms appears anywhere in the application.

The absence of "SwitchLight" deserves emphasis. This term was searched
across three independent codebases:

1. The `beeble-ai` engine binary (56 MB) -- zero matches
2. The `beeble-engine-setup` binary (13 MB) -- zero matches
3. All 667 JavaScript files in the Electron app's `dist/` directory --
   zero matches

"SwitchLight" is purely a marketing name. It does not appear as a
model name, a class name, a configuration key, a log message, or a
comment anywhere in the application. By contrast, open-source
component names appear throughout the binary because they are real
software identifiers used by real code. "SwitchLight" is not used by
any code at all.

This is a significant absence. When an application uses a library or
implements an algorithm, its terminology appears in memory through
function names, variable names, error messages, logging, docstrings,
or class definitions. The open-source components (InSPyReNet, Depth
Anything, DINOv2, RT-DETR, BoxMOT) are all identifiable precisely
because their terminology is present. The physics-based rendering
vocabulary described in the CVPR paper is entirely absent.

There is a caveat: Beeble encrypts its custom Python code with
Pyarmor, which encrypts bytecode and obfuscates module names. If the
Cook-Torrance logic exists only in Pyarmor-encrypted modules, its
terminology would not be visible to string extraction. However,
TensorRT layer names, model checkpoint references, and library-level
strings survive Pyarmor encryption--and none of those contain
physics-based rendering terminology either.

### 4.3 What the binary contains instead

Where you would expect physics-based rendering components, the
binary shows standard machine learning infrastructure:

- **segmentation_models_pytorch** -- an encoder-decoder segmentation
  framework designed for dense pixel prediction tasks. It provides
  architectures (UNet, FPN, DeepLabV3) that take pretrained encoder
  backbones and learn to predict pixel-level outputs.

- **PP-HGNet, ResNet-34, ResNet-50, ResNeXt-101** -- standard
  pretrained backbone architectures, all available from timm. These
  are the encoders that plug into segmentation_models_pytorch.

- **DINOv2** -- a self-supervised feature extractor that provides
  rich visual features as input to downstream models.

- **DisentangledAttention** -- a transformer attention mechanism,
  compiled as a custom TRT plugin for inference.

This is the standard toolkit for building dense prediction models
in computer vision. You pick an encoder backbone, connect it to a
segmentation decoder, and train the resulting model to predict
whatever pixel-level output you need--whether that is semantic labels,
depth values, or normal vectors.

### 4.4 What the Electron app reveals

The application's Electron shell (the UI layer that orchestrates the
Python engine) is not encrypted and provides clear evidence about
the pipeline structure.

The engine binary receives independent processing flags:

- `--run-alpha` -- generates alpha mattes
- `--run-depth` -- generates depth maps
- `--run-pbr` -- generates BaseColor, Normal, Roughness, Specular,
  Metallic

Each flag can be used in isolation. A user can request alpha without
depth, or depth without PBR. The Electron app constructs these flags
independently based on user selections.

A session-start log entry captured in process memory confirms this
separation:

```json
{
  "extra_command": "--run-pbr --run-alpha --run-depth
      --save-exr --pbr-stride 1,2 --fps 24.0
      --engine-version r1.3.0-m1.1.1"
}
```

The `--pbr-stride 1,2` flag is notable. It indicates that PBR passes
are not processed on every frame--they use a stride, processing a
subset of frames and presumably interpolating the rest. This
contradicts the "true end-to-end video model that understands motion
natively" claim on Beeble's research page. A model that truly
processes video end-to-end would not need to skip frames.

### 4.5 What this suggests

The evidence points to a specific conclusion: the PBR decomposition
model is most likely a standard encoder-decoder segmentation model
(segmentation_models_pytorch architecture) with pretrained backbones
(PP-HGNet, ResNet, DINOv2), trained on Beeble's private dataset to
predict PBR channels as its output.

This is a common and well-understood approach in computer vision.
You take a pretrained backbone, attach a decoder, and train the whole
model on your task-specific data using task-specific losses. The
Cook-Torrance reflectance model described in the CVPR paper would
then be a *training-time loss function*--used to compute the error
between predicted and ground-truth renders during training--rather
than an architectural component that exists at inference time.

This distinction matters because it changes what "Powered by
SwitchLight 3.0" actually means. The CVPR paper's framing suggests a
novel physics-driven architecture. The binary evidence suggests
standard open-source architectures trained with proprietary data. The
genuine proprietary elements are the training methodology, the
lightstage training data, and the trained weights--not the model
architecture itself.

We want to be clear about the limits of this inference. The Pyarmor
encryption prevents us from seeing the actual pipeline code, and the
TensorRT engines inside the encrypted `.enc` model files do not
expose their internal layer structure through string extraction. It is
possible, though we think unlikely, that the physics-based rendering
code exists entirely within the encrypted layers and uses no standard
terminology. We present this analysis as our best reading of the
available evidence, not as a certainty.


## 5. Code protection

Beeble uses two layers of protection to obscure its pipeline:

**Model encryption.** The six model files are stored as `.enc`
files encrypted with AES. They total 4.4 GB:

| File | Size |
|------|------|
| 97b0085560.enc | 1,877 MB |
| b001322340.enc | 1,877 MB |
| 6edccd5753.enc | 351 MB |
| e710b0c669.enc | 135 MB |
| 0d407dcf32.enc | 111 MB |
| 7f121ea5bc.enc | 49 MB |

The filenames are derived from their SHA-256 hashes. No metadata
in the manifest indicates what each model does. However, comparing
file sizes against known open-source model checkpoints is suggestive:

- The 351 MB file closely matches the size of a DINOv2 ViT-B
  checkpoint (~346 MB for `dinov2_vitb14_pretrain.pth`)
- The two ~1,877 MB files are nearly identical in size (within 1 MB
  of each other), suggesting two variants of the same model compiled
  to TensorRT engines--possibly different precision levels or input
  resolution configurations
- The smaller files (49 MB, 111 MB, 135 MB) are consistent with
  single-task encoder-decoder models compiled to TensorRT with INT8
  quantization

**Code obfuscation.** All custom Python code is encrypted with
Pyarmor. Module names are randomized (`q47ne3pa`, `qf1hf17m`,
`vk3zuv58`) and bytecode is decrypted only at runtime. The
application contains approximately 82 obfuscated modules across
three main packages, with the largest single module being 108 KB.

This level of protection is unusual for a desktop application in
the VFX space, and it is worth understanding what it does and does
not hide. Pyarmor prevents reading the pipeline orchestration
code--how models are loaded, connected, and run. But it does not
hide which libraries are loaded into memory, which TensorRT plugins
are compiled, or what command-line interface the engine exposes.
Those are the evidence sources this analysis relies on.


## 6. Beeble's public claims

Beeble's marketing consistently attributes the entire Video-to-VFX
pipeline to SwitchLight. The following are exact quotes from their
public pages (see [evidence/marketing_claims.md](../evidence/marketing_claims.md)
for the complete archive).

**Beeble Studio product page** (beeble.ai/beeble-studio):

> Powered by **SwitchLight 3.0**, convert images and videos into
> **full PBR passes with alpha and depth maps** for seamless
> relighting, background removal, and advanced compositing.

**SwitchLight 3.0 research page** (beeble.ai/research/switchlight-3-0-is-here):

> SwitchLight 3.0 is the best Video-to-PBR model in the world.

> SwitchLight 3.0 is a **true end-to-end video model** that
> understands motion natively.

**Documentation FAQ** (docs.beeble.ai/help/faq):

On the "What is Video-to-VFX?" question:

> **Video-to-VFX** uses our foundation model, **SwitchLight 3.0**,
> and SOTA AI models to convert your footage into VFX-ready assets.

On the "Is Beeble's AI trained responsibly?" question:

> When open-source models are included, we choose them
> carefully--only those with published research papers that disclose
> their training data and carry valid commercial-use licenses.

The FAQ is the only public place where Beeble acknowledges the use
of open-source models. The product page and research page present
the entire pipeline as "Powered by SwitchLight 3.0" without
distinguishing which output passes come from SwitchLight versus
third-party open-source models.

### Investor-facing claims

Beeble raised a $4.75M seed round in July 2024 at a reported $25M
valuation, led by Basis Set Ventures and Fika Ventures. At the time,
the company had approximately 7 employees. Press coverage of the
funding consistently uses language like "foundational model" and
"world-class foundational model in lighting" to describe
SwitchLight--language that implies a novel, proprietary system rather
than a pipeline of open-source components with proprietary weights.

These investor-facing claims were made through public press releases
and coverage, not private communications. They are relevant because
they represent how Beeble chose to characterize its technology to
the market. See [evidence/marketing_claims.md](../evidence/marketing_claims.md)
for archived quotes.

The "true end-to-end video model" claim is particularly difficult
to reconcile with the evidence. The application processes alpha,
depth, and PBR as independent stages using separate CLI flags.
PBR processing uses a frame stride (`--pbr-stride 1,2`), skipping
frames rather than processing video natively. This is a pipeline
of separate models, not an end-to-end video model.


## 7. What Beeble does well

This analysis would be incomplete without acknowledging what is
genuinely Beeble's own work.

**SwitchLight is published research.** The CVPR 2024 paper describes
a real methodology for training intrinsic decomposition models using
lightstage data and physics-based losses. Whether the deployed
architecture matches the paper's description is a separate question
from whether the research itself has merit. It does.

**The trained weights are real work.** If the PBR model is built on
standard architectures (as the evidence suggests), the value lies in
the training data and training process. Acquiring lightstage data,
designing loss functions, and iterating on model quality is
substantial work. Pretrained model weights trained on high-quality
domain-specific data are genuinely valuable, even when the
architecture is standard.

**TensorRT compilation is non-trivial engineering.** Converting
PyTorch models to TensorRT engines with INT8 quantization for
real-time inference requires expertise. The application runs at
interactive speeds on consumer GPUs with 11 GB+ VRAM.

**The product is a real product.** The desktop application, Nuke/
Blender/Unreal integrations, cloud API, render queue, EXR output
with ACEScg color space support, and overall UX represent
substantial product engineering.


## 8. The real question

Most Beeble Studio users use the application for PBR extractions:
alpha mattes, diffuse/albedo, normals, and depth maps. The
relighting features exist but are secondary to the extraction
workflow for much of the user base.

The alpha and depth extractions are produced by open-source models
used off the shelf. They can be replicated for free using the exact
same libraries.

The PBR extractions (normal, base color, roughness, specular,
metallic) use models whose trained weights are proprietary, but
whose architecture appears to be built from the same open-source
frameworks available to anyone. Open-source alternatives for PBR
decomposition now exist (CHORD from Ubisoft, RGB-X from Adobe) and
are narrowing the quality gap, though they were trained on different
data and may perform differently on portrait subjects.

See [COMFYUI_GUIDE.md](COMFYUI_GUIDE.md) for a detailed guide on
replicating each stage of the pipeline with open-source tools.

There is a common assumption that the training data represents a
significant barrier to replication--that lightstage captures are
expensive and rare, and therefore the trained weights are uniquely
valuable. This may overstate the difficulty. For PBR decomposition
training, what you need is a dataset of images paired with
ground-truth PBR maps (albedo, normal, roughness, metallic). Modern
3D character pipelines--Unreal Engine MetaHumans, Blender character
generators, procedural systems in Houdini--can render hundreds of
thousands of such pairs with varied poses, lighting, skin tones, and
clothing. The ground truth is inherent: you created the scene, so you
already have the PBR maps. With model sizes under 2 GB and standard
encoder-decoder architectures, the compute cost to train equivalent
models from synthetic data is modest.

None of this means Beeble has no value. Convenience, polish, and
integration are real things people pay for. But the gap between
what the marketing says ("Powered by SwitchLight 3.0") and what
the application contains (a pipeline of mostly open-source
components, some used directly and others used as architectural
building blocks) is wider than what users would reasonably expect.
And the technical moat may be thinner than investors were led to
believe.


## 9. License compliance

All identified open-source components require attribution in
redistributed software. Both the MIT License and Apache 2.0 License
require that copyright notices and license texts be included with
any distribution of the software.

No such attribution was found in Beeble Studio's application,
documentation, or user-facing materials.

The scope of the issue extends beyond the core models. The
application bundles approximately 48 Python packages in its `lib/`
directory. Of these, only 6 include LICENSE files (cryptography,
gdown, MarkupSafe, numpy, openexr, triton). The remaining 42
packages--including PyTorch, Kornia, Pillow, and others with
attribution requirements--have no license files in the distribution.

For a detailed analysis of each license's requirements and what
compliance would look like, see
[LICENSE_ANALYSIS.md](LICENSE_ANALYSIS.md).


## 10. Conclusion

Beeble Studio's Video-to-VFX pipeline is a collection of independent
models, most built from open-source components.

The preprocessing stages are entirely open-source: background removal
(InSPyReNet), depth estimation (Depth Anything V2), person detection
(RT-DETR with PP-HGNet), face detection (Kornia), multi-object
tracking (BoxMOT), edge detection (DexiNed), and super resolution
(RRDB-Net). The PBR decomposition models appear to be built on
open-source architectural frameworks (segmentation_models_pytorch,
timm backbones) with proprietary trained weights.

The name "SwitchLight" does not appear anywhere in the application--
not in the engine binary, not in the setup binary, not in the
Electron app's 667 JavaScript files. It is a marketing name that
refers to no identifiable software component.

The CVPR 2024 paper describes a physics-based inverse rendering
architecture. The deployed application contains no evidence of
physics-based rendering code at inference time. The most likely
explanation is that the physics (Cook-Torrance rendering) was used
during training as a loss function, and the deployed model is a
standard feedforward network that learned to predict PBR channels
from that training process.

Beeble's marketing attributes the entire pipeline to SwitchLight
3.0. The evidence shows that alpha mattes come from InSPyReNet, depth
maps come from Depth Anything V2, person detection comes from
RT-DETR, tracking comes from BoxMOT, and the PBR models are built on
segmentation_models_pytorch with PP-HGNet and ResNet backbones. The
"true end-to-end video model" claim is contradicted by the
independent processing flags and frame stride parameter observed in
the application.

Of the approximately 48 Python packages bundled with the application,
only 6 include license files. The core open-source models' licenses
require attribution that does not appear to be provided.

These findings can be independently verified using the methods
described in [VERIFICATION_GUIDE.md](VERIFICATION_GUIDE.md).