7.4 KiB
Methodology
This document describes how the analysis was performed and, equally important, what was not done.
Approach
The analysis used standard forensic techniques that any security researcher or system administrator would recognize. No proprietary code was reverse-engineered, no encryption was broken, and no software was decompiled.
Five complementary methods were used, each revealing different aspects of the application's composition.
What was done
1. String extraction from process memory
The core technique. When a Linux application runs, its loaded
libraries, model metadata, and configuration data are present in
process memory as readable strings. The strings command and
standard text search tools extract these without interacting with the
application's logic in any way.
This is the same technique used in malware analysis, software auditing, and license compliance verification across the industry. It reveals what libraries and models are loaded, but not how they are used or what proprietary code does with them.
Extracted strings were searched for known identifiers--library names, model checkpoint filenames, Python package docstrings, API signatures, and TensorRT layer names that correspond to published open-source projects. Each match was compared against the source code of the corresponding open-source project to confirm identity.
2. TensorRT plugin analysis
TensorRT plugins are named components compiled for GPU inference.
Their names appear in the binary and reveal which neural network
operations are being used. Standard plugins (like convolution or
batch normalization) are not informative, but custom plugins with
distinctive names--like DisentangledAttention_TRT or
RnRes2FullFusion_TRT--identify specific architectures.
Plugin names, along with quantization patterns (e.g.,
int8_resnet50_stage_2_fusion), indicate which backbone
architectures have been compiled for production inference and at
what precision.
3. PyInstaller module listing
The beeble-ai binary is a PyInstaller-packaged Python application.
PyInstaller bundles Python modules into an archive whose table of
contents is readable without executing the application. This reveals
which Python packages are bundled, including both open-source
libraries and obfuscated proprietary modules.
The module listing identified 7,132 bundled Python modules, including the Pyarmor runtime used to encrypt Beeble's custom code. The obfuscated module structure (three main packages with randomized names, totaling approximately 82 modules) reveals the approximate scope of the proprietary code.
4. Electron app inspection
Beeble Studio's desktop UI is an Electron application. The compiled
JavaScript code in the dist/ directory is not obfuscated and
reveals how the UI orchestrates the Python engine binary. This
analysis examined:
- CLI flag construction (what arguments are passed to the engine)
- Database schema (what data is stored about jobs and outputs)
- Output directory structure (what files the engine produces)
- Progress reporting (what processing stages the engine reports)
This is the source of evidence about independent processing stages
(--run-alpha, --run-depth, --run-pbr), the PBR frame stride
parameter, and the output channel structure.
5. Library directory inventory
The application's lib/ directory contains approximately 48 Python
packages deployed alongside the main binary. These were inventoried
to determine which packages are present, their version numbers, and
whether license files are included. This is a straightforward
directory listing--no files were extracted, modified, or executed.
The inventory revealed specific library versions (PyTorch 2.8.0, timm 1.0.15, OpenCV 4.11.0.86), confirmed which packages are deployed as separate directories versus compiled into the PyInstaller binary, and identified the license file gap (only 6 of 48 packages include their license files).
6. Engine setup log analysis
The application's setup process produces a detailed log file that records every file downloaded during installation. This log was read to understand the full scope of the deployment: total file count, total download size, and the complete list of downloaded components. The log is generated during normal operation and does not require any special access to read.
7. Manifest and public claims review
The application's manifest.json file, downloaded during normal
operation, was inspected for model references and metadata. Beeble's
website, documentation, FAQ, and research pages were reviewed to
understand how the technology is described to users. All public
claims were archived with URLs and timestamps.
What was not done
This list defines the boundaries of the analysis and establishes that no proprietary technology was compromised.
-
No decompilation or disassembly. The
beeble-aibinary was never decompiled, disassembled, or analyzed at the instruction level. No tools like Ghidra, IDA Pro, or objdump were used to examine executable code. -
No encryption was broken. Beeble encrypts its model files with AES. Those encrypted files were not decrypted, and no attempt was made to recover encryption keys.
-
No Pyarmor circumvention. The Pyarmor runtime that encrypts Beeble's custom Python code was not bypassed, attacked, or circumvented. The analysis relied on evidence visible outside the encrypted modules.
-
No code reverse-engineering. The analysis did not examine how Beeble's proprietary code works, how models are orchestrated, or how SwitchLight processes its inputs. The only things identified were which third-party components are present and what architectural patterns they suggest.
-
No network interception. No man-in-the-middle proxies or traffic analysis tools were used to intercept communications between the application and Beeble's servers.
-
No license circumvention. The application was used under a valid license. No copy protection or DRM was circumvented.
Limitations
This analysis can identify what components are present and draw reasonable inferences about how they are used, but it cannot see inside the encrypted code or the encrypted model files. Several important limitations follow:
Architecture inference is indirect. The conclusion that PBR models use segmentation_models_pytorch architecture is based on the co-presence of that framework, compatible backbones, and the absence of alternative architectural patterns. It is not based on direct observation of the model graph. Pyarmor encryption prevents reading the code that connects these components.
TensorRT engines are opaque. The compiled model engines inside
the .enc files do not expose their internal layer structure to
string extraction. The TRT plugins and quantization patterns found
in the binary come from the TensorRT runtime environment, not from
inside the encrypted model files.
Single version analyzed. The analysis was performed on one version of the Linux desktop application (engine version r1.3.0, model version m1.1.1). Other versions and platforms may differ.
String extraction is inherently noisy. Some identified strings may come from transient data, cached web content, or libraries loaded but not actively used in inference. The findings focus on strings that are unambiguous--complete docstrings, model checkpoint URLs, TensorRT plugin registrations, and package-specific identifiers that cannot plausibly appear by accident.