beeble-forensic-analysis/docs/METHODOLOGY.md

# Methodology

This document describes how the analysis was performed and, equally
important, what was not done.


## Approach

The analysis used standard forensic techniques that any security
researcher or system administrator would recognize. No proprietary
code was reverse-engineered, no encryption was broken, and no
software was decompiled.

Five complementary methods were used, each revealing different
aspects of the application's composition.


## What was done

### 1. String extraction from process memory

The core technique. When a Linux application runs, its loaded
libraries, model metadata, and configuration data are present in
process memory as readable strings. The `strings` command and
standard text search tools extract these without interacting with the
application's logic in any way.

This is the same technique used in malware analysis, software
auditing, and license compliance verification across the industry.
It reveals what libraries and models are loaded, but not how they
are used or what proprietary code does with them.

Extracted strings were searched for known identifiers--library names,
model checkpoint filenames, Python package docstrings, API
signatures, and TensorRT layer names that correspond to published
open-source projects. Each match was compared against the source code
of the corresponding open-source project to confirm identity.

### 2. TensorRT plugin analysis

TensorRT plugins are named components compiled for GPU inference.
Their names appear in the binary and reveal which neural network
operations are being used. Standard plugins (like convolution or
batch normalization) are not informative, but custom plugins with
distinctive names--like `DisentangledAttention_TRT` or
`RnRes2FullFusion_TRT`--identify specific architectures.

Plugin names, along with quantization patterns (e.g.,
`int8_resnet50_stage_2_fusion`), indicate which backbone
architectures have been compiled for production inference and at
what precision.

### 3. PyInstaller module listing

The `beeble-ai` binary is a PyInstaller-packaged Python application.
PyInstaller bundles Python modules into an archive whose table of
contents is readable without executing the application. This reveals
which Python packages are bundled, including both open-source
libraries and obfuscated proprietary modules.

The module listing identified 7,132 bundled Python modules, including
the Pyarmor runtime used to encrypt Beeble's custom code. The
obfuscated module structure (three main packages with randomized
names, totaling approximately 82 modules) reveals the approximate
scope of the proprietary code.

### 4. Electron app inspection

Beeble Studio's desktop UI is an Electron application. The compiled
JavaScript code in the `dist/` directory is not obfuscated and
reveals how the UI orchestrates the Python engine binary. This
analysis examined:

- CLI flag construction (what arguments are passed to the engine)
- Database schema (what data is stored about jobs and outputs)
- Output directory structure (what files the engine produces)
- Progress reporting (what processing stages the engine reports)

This is the source of evidence about independent processing stages
(`--run-alpha`, `--run-depth`, `--run-pbr`), the PBR frame stride
parameter, and the output channel structure.

### 5. Library directory inventory

The application's `lib/` directory contains approximately 48 Python
packages deployed alongside the main binary. These were inventoried
to determine which packages are present, their version numbers, and
whether license files are included. This is a straightforward
directory listing--no files were extracted, modified, or executed.

The inventory revealed specific library versions (PyTorch 2.8.0,
timm 1.0.15, OpenCV 4.11.0.86), confirmed which packages are
deployed as separate directories versus compiled into the PyInstaller
binary, and identified the license file gap (only 6 of 48 packages
include their license files).


### 6. Engine setup log analysis

The application's setup process produces a detailed log file that
records every file downloaded during installation. This log was
read to understand the full scope of the deployment: total file
count, total download size, and the complete list of downloaded
components. The log is generated during normal operation and does
not require any special access to read.


### 7. Manifest and public claims review

The application's `manifest.json` file, downloaded during normal
operation, was inspected for model references and metadata. Beeble's
website, documentation, FAQ, and research pages were reviewed to
understand how the technology is described to users. All public
claims were archived with URLs and timestamps.

The manifest confirms Python 3.11 as the runtime (via the presence of
`libpython3.11.so.1.0` in the downloaded files). TensorRT 10.12.0 was
also identified, and notably, builder resources are present alongside
the runtime--not just inference libraries. The presence of TensorRT
builder components suggests possible on-device model compilation,
meaning TensorRT engines may be compiled locally on the user's GPU
rather than shipped as pre-built binaries.


## What was not done

This list defines the boundaries of the analysis and establishes
that no proprietary technology was compromised.

- **No decompilation or disassembly.** The `beeble-ai` binary was
  never decompiled, disassembled, or analyzed at the instruction
  level. No tools like Ghidra, IDA Pro, or objdump were used to
  examine executable code.

- **No encryption was broken.** Beeble encrypts its model files with
  AES. Those encrypted files were not decrypted, and no attempt was
  made to recover encryption keys.

- **No Pyarmor circumvention.** The Pyarmor runtime that encrypts
  Beeble's custom Python code was not bypassed, attacked, or
  circumvented. The analysis relied on evidence visible outside the
  encrypted modules.

- **No code reverse-engineering.** The analysis did not examine how
  Beeble's proprietary code works, how models are orchestrated, or
  how SwitchLight processes its inputs. The only things identified
  were which third-party components are present and what
  architectural patterns they suggest.

- **No network interception.** No man-in-the-middle proxies or
  traffic analysis tools were used to intercept communications
  between the application and Beeble's servers.

- **No license circumvention.** The application was used under a
  valid license. No copy protection or DRM was circumvented.


## Limitations

This analysis can identify what components are present and draw
reasonable inferences about how they are used, but it cannot see
inside the encrypted code or the encrypted model files. Several
important limitations follow:

**Architecture inference is indirect.** The conclusion that PBR
models use segmentation_models_pytorch architecture is based on
the co-presence of that framework, compatible backbones, and the
absence of alternative architectural patterns. It is not based on
direct observation of the model graph. Pyarmor encryption prevents
reading the code that connects these components.

**TensorRT engines are opaque.** The compiled model engines inside
the `.enc` files do not expose their internal layer structure to
string extraction. The TRT plugins and quantization patterns found
in the binary come from the TensorRT runtime environment, not from
inside the encrypted model files.

**Single version analyzed.** The analysis was performed on one
version of the Linux desktop application (engine version r1.3.0,
model version m1.1.1). Other versions and platforms may differ.

**String extraction is inherently noisy.** Some identified strings
may come from transient data, cached web content, or libraries
loaded but not actively used in inference. The findings focus on
strings that are unambiguous--complete docstrings, model checkpoint
URLs, TensorRT plugin registrations, and package-specific identifiers
that cannot plausibly appear by accident.