Methodology

This document describes how the analysis was performed and, equally important, what was not done.

Approach

The analysis used standard forensic techniques that any security researcher or system administrator would recognize. No proprietary code was reverse-engineered, no encryption was broken, and no software was decompiled.

Five complementary methods were used, each revealing different aspects of the application's composition.

What was done

1. String extraction from process memory

The core technique. When a Linux application runs, its loaded libraries, model metadata, and configuration data are present in process memory as readable strings. The strings command and standard text search tools extract these without interacting with the application's logic in any way.

This is the same technique used in malware analysis, software auditing, and license compliance verification across the industry. It reveals what libraries and models are loaded, but not how they are used or what proprietary code does with them.

Extracted strings were searched for known identifiers--library names, model checkpoint filenames, Python package docstrings, API signatures, and TensorRT layer names that correspond to published open-source projects. Each match was compared against the source code of the corresponding open-source project to confirm identity.

2. TensorRT plugin analysis

TensorRT plugins are named components compiled for GPU inference. Their names appear in the binary and reveal which neural network operations are being used. Standard plugins (like convolution or batch normalization) are not informative, but custom plugins with distinctive names--like DisentangledAttention_TRT or RnRes2FullFusion_TRT--identify specific architectures.

Plugin names, along with quantization patterns (e.g., int8_resnet50_stage_2_fusion), indicate which backbone architectures have been compiled for production inference and at what precision.

3. PyInstaller module listing

The beeble-ai binary is a PyInstaller-packaged Python application. PyInstaller bundles Python modules into an archive whose table of contents is readable without executing the application. This reveals which Python packages are bundled, including both open-source libraries and obfuscated proprietary modules.

The module listing identified 7,132 bundled Python modules, including the Pyarmor runtime used to encrypt Beeble's custom code. The obfuscated module structure (three main packages with randomized names, totaling approximately 82 modules) reveals the approximate scope of the proprietary code.

4. Electron app inspection

Beeble Studio's desktop UI is an Electron application. The compiled JavaScript code in the dist/ directory is not obfuscated and reveals how the UI orchestrates the Python engine binary. This analysis examined:

CLI flag construction (what arguments are passed to the engine)
Database schema (what data is stored about jobs and outputs)
Output directory structure (what files the engine produces)
Progress reporting (what processing stages the engine reports)

This is the source of evidence about independent processing stages (--run-alpha, --run-depth, --run-pbr), the PBR frame stride parameter, and the output channel structure.

5. Library directory inventory

The application's lib/ directory contains approximately 48 Python packages deployed alongside the main binary. These were inventoried to determine which packages are present, their version numbers, and whether license files are included. This is a straightforward directory listing--no files were extracted, modified, or executed.

The inventory revealed specific library versions (PyTorch 2.8.0, timm 1.0.15, OpenCV 4.11.0.86), confirmed which packages are deployed as separate directories versus compiled into the PyInstaller binary, and identified the license file gap (only 6 of 48 packages include their license files).

6. Engine setup log analysis

The application's setup process produces a detailed log file that records every file downloaded during installation. This log was read to understand the full scope of the deployment: total file count, total download size, and the complete list of downloaded components. The log is generated during normal operation and does not require any special access to read.

7. Manifest and public claims review

The application's manifest.json file, downloaded during normal operation, was inspected for model references and metadata. Beeble's website, documentation, FAQ, and research pages were reviewed to understand how the technology is described to users. All public claims were archived with URLs and timestamps.

What was not done

This list defines the boundaries of the analysis and establishes that no proprietary technology was compromised.

No decompilation or disassembly. The beeble-ai binary was never decompiled, disassembled, or analyzed at the instruction level. No tools like Ghidra, IDA Pro, or objdump were used to examine executable code.
No encryption was broken. Beeble encrypts its model files with AES. Those encrypted files were not decrypted, and no attempt was made to recover encryption keys.
No Pyarmor circumvention. The Pyarmor runtime that encrypts Beeble's custom Python code was not bypassed, attacked, or circumvented. The analysis relied on evidence visible outside the encrypted modules.
No code reverse-engineering. The analysis did not examine how Beeble's proprietary code works, how models are orchestrated, or how SwitchLight processes its inputs. The only things identified were which third-party components are present and what architectural patterns they suggest.
No network interception. No man-in-the-middle proxies or traffic analysis tools were used to intercept communications between the application and Beeble's servers.
No license circumvention. The application was used under a valid license. No copy protection or DRM was circumvented.

Limitations

This analysis can identify what components are present and draw reasonable inferences about how they are used, but it cannot see inside the encrypted code or the encrypted model files. Several important limitations follow:

Architecture inference is indirect. The conclusion that PBR models use segmentation_models_pytorch architecture is based on the co-presence of that framework, compatible backbones, and the absence of alternative architectural patterns. It is not based on direct observation of the model graph. Pyarmor encryption prevents reading the code that connects these components.

TensorRT engines are opaque. The compiled model engines inside the .enc files do not expose their internal layer structure to string extraction. The TRT plugins and quantization patterns found in the binary come from the TensorRT runtime environment, not from inside the encrypted model files.

Single version analyzed. The analysis was performed on one version of the Linux desktop application (engine version r1.3.0, model version m1.1.1). Other versions and platforms may differ.

String extraction is inherently noisy. Some identified strings may come from transient data, cached web content, or libraries loaded but not actively used in inference. The findings focus on strings that are unambiguous--complete docstrings, model checkpoint URLs, TensorRT plugin registrations, and package-specific identifiers that cannot plausibly appear by accident.

7.4 KiB Raw Blame History