Optimize SDXL generation: fix memory leak, add CPU offloading, update docs, and improve file saving

2026-01-26 14:01:22 -07:00 · 2026-01-26 14:01:22 -07:00 · bba3318ab5
commit bba3318ab5
parent 713bda3bfa
3 changed files with 94 additions and 3 deletions
--- a/.gitignore
+++ b/.gitignore
@ -1,6 +1,7 @@
 venv/
 hf_cache/
 models/
+output/
 __pycache__/
 *.pyc
 .DS_Store
--- a/README.md
+++ b/README.md
@ -0,0 +1,86 @@
+# ⚡ SDXL-Lightning Image Generator for macOS
+
+This project runs **Stable Diffusion XL Lightning (4-step)** natively on your Mac using **Apple Silicon (M1/M2/M3)** acceleration (MPS). It is optimized to generate high-quality 1024x1024 images in seconds with minimal setup.
+
+## ✨ Features
+
+- **Blazing Fast**: Uses SDXL-Lightning 4-step UNet for rapid generation (approx. ~10-30s per image on M1/M2).
+- **Native Mac Support**: Leverage your Mac's GPU with Metal Performance Shaders (MPS).
+- **Memory Optimized**: Automatic CPU offloading to run even on 8GB/16GB Macs without crashing.
+- **Local Privacy**: All models run locally on your machine. No cloud API keys needed.
+- **Auto-Download**: Automatically fetches required model weights on first run.
+
+## 🚀 Prerequisites
+
+- macOS 12.3+ (Monterey or newer)
+- Mac with Apple Silicon (M1, M2, M3)
+- Python 3.9 or newer installed (check with `python3 --version`)
+
+## 🛠️ Installation & Setup
+
+1. **Clone the repository** (if you haven't already):
+   ```bash
+   git clone <your-repo-url>
+   cd "Image Generation"
+   ```
+
+2. **Create a virtual environment** to keep dependencies clean:
+   ```bash
+   python3 -m venv venv
+   source venv/bin/activate
+   ```
+
+3. **Install dependencies**:
+   ```bash
+   pip install -r requirements.txt
+   ```
+   *(This installs `torch`, `diffusers`, `transformers`, and `accelerate` optimized for Mac)*
+
+## 🎨 Usage
+
+### Basic Generation
+Generate an image with a simple text prompt. The first time you run this, it will download the necessary models (~6GB).
+
+```bash
+# Make sure your virtual environment is active!
+source venv/bin/activate
+
+# Run the generator
+python generate.py "A futuristic cityscape at sunset, highly detailed, cyberpunk style, neon lights"
+```
+
+### Advanced Options
+You can customize the resolution and quality settings:
+
+```bash
+python generate.py "An astronaut riding a horse on mars, realistic, 8k" --width 1024 --height 1024 --steps 4
+```
+
+| Flag | Default | Description |
+| :--- | :--- | :--- |
+| `prompt` | (Required) | The description of the image you want to generate. |
+| `--width` | `1920` | Width of the image. Standard SDXL is optimized for `1024`. |
+| `--height` | `1080` | Height of the image. |
+| `--steps` | `4` | Number of inference steps. 4-8 is recommended for Lightning. |
+
+**Output Location**:  
+Images are saved automatically to the `output/` folder in this directory.
+
+## ⚡ First Run Note
+The first time you run the script, it will download:
+1. **SDXL Base Model**: ~6GB (Cached in `hf_cache/`)
+2. **Lightning UNet**: ~5GB (Saved in `models/`)
+
+Use a fast internet connection! Subsequent runs will be instant.
+
+## 🔧 Troubleshooting
+
+- **"MPS backend out of memory"**:  
+  This means your Mac ran out of GPU memory. The script includes `pipe.enable_model_cpu_offload()` to prevent this. try restarting your computer or closing other heavy apps.
+- **"Permission denied" errors**:  
+  You might see "mpsgraph" permission errors in the terminal. These are harmless warnings from macOS's Metal framework and can be ignored. The image generation will still work.
+- **Slow First Generation**:  
+  Shader compilation happens on the very first run. Future generations will be much faster.
+
+## 🤝 Contributing
+Feel free to open issues or submit PRs to improve performance or add features!
--- a/generate.py
+++ b/generate.py
@ -26,7 +26,9 @@ def generate_image(prompt, width=1920, height=1080, steps=4):

    # Load UNet from local file
    print("Loading UNet from local file...")
-    unet = UNet2DConditionModel.from_config(base, subfolder="unet").to(device, torch.float16)
+    # Fix FutureWarning: Load config first
+    unet_config = UNet2DConditionModel.load_config(base, subfolder="unet")
+    unet = UNet2DConditionModel.from_config(unet_config).to(device, torch.float16)
    unet.load_state_dict(load_file(local_unet, device=device))

    # Load Pipeline
@ -36,7 +38,8 @@ def generate_image(prompt, width=1920, height=1080, steps=4):
    # Optimizations for Mac/MPS
    print("Enabling attention slicing for memory efficiency...")
    pipe.enable_attention_slicing()
-    # pipe.enable_model_cpu_offload() # Uncomment if running out of memory
+    print("Enabling model CPU offloading for memory efficiency...")
+    pipe.enable_model_cpu_offload()

    # Ensure scheduler is correct for Lightning
    pipe.scheduler = EulerDiscreteScheduler.from_config(pipe.scheduler.config, timestep_spacing="trailing")
@ -46,7 +49,8 @@ def generate_image(prompt, width=1920, height=1080, steps=4):
    image = pipe(prompt, num_inference_steps=steps, guidance_scale=0, width=width, height=height).images[0]

    # Save
-    save_dir = os.path.expanduser("~/Documents/Image Generations")
+    # Save
+    save_dir = os.path.join(os.path.dirname(os.path.abspath(__file__)), "output")
    os.makedirs(save_dir, exist_ok=True)
    
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")