Optimize SDXL generation: fix memory leak, add CPU offloading, update docs, and improve file saving

This commit is contained in:
Avery Felts 2026-01-26 14:01:22 -07:00
parent 713bda3bfa
commit bba3318ab5
3 changed files with 94 additions and 3 deletions

1
.gitignore vendored
View File

@ -1,6 +1,7 @@
venv/
hf_cache/
models/
output/
__pycache__/
*.pyc
.DS_Store

86
README.md Normal file
View File

@ -0,0 +1,86 @@
# ⚡ SDXL-Lightning Image Generator for macOS
This project runs **Stable Diffusion XL Lightning (4-step)** natively on your Mac using **Apple Silicon (M1/M2/M3)** acceleration (MPS). It is optimized to generate high-quality 1024x1024 images in seconds with minimal setup.
## ✨ Features
- **Blazing Fast**: Uses SDXL-Lightning 4-step UNet for rapid generation (approx. ~10-30s per image on M1/M2).
- **Native Mac Support**: Leverage your Mac's GPU with Metal Performance Shaders (MPS).
- **Memory Optimized**: Automatic CPU offloading to run even on 8GB/16GB Macs without crashing.
- **Local Privacy**: All models run locally on your machine. No cloud API keys needed.
- **Auto-Download**: Automatically fetches required model weights on first run.
## 🚀 Prerequisites
- macOS 12.3+ (Monterey or newer)
- Mac with Apple Silicon (M1, M2, M3)
- Python 3.9 or newer installed (check with `python3 --version`)
## 🛠️ Installation & Setup
1. **Clone the repository** (if you haven't already):
```bash
git clone <your-repo-url>
cd "Image Generation"
```
2. **Create a virtual environment** to keep dependencies clean:
```bash
python3 -m venv venv
source venv/bin/activate
```
3. **Install dependencies**:
```bash
pip install -r requirements.txt
```
*(This installs `torch`, `diffusers`, `transformers`, and `accelerate` optimized for Mac)*
## 🎨 Usage
### Basic Generation
Generate an image with a simple text prompt. The first time you run this, it will download the necessary models (~6GB).
```bash
# Make sure your virtual environment is active!
source venv/bin/activate
# Run the generator
python generate.py "A futuristic cityscape at sunset, highly detailed, cyberpunk style, neon lights"
```
### Advanced Options
You can customize the resolution and quality settings:
```bash
python generate.py "An astronaut riding a horse on mars, realistic, 8k" --width 1024 --height 1024 --steps 4
```
| Flag | Default | Description |
| :--- | :--- | :--- |
| `prompt` | (Required) | The description of the image you want to generate. |
| `--width` | `1920` | Width of the image. Standard SDXL is optimized for `1024`. |
| `--height` | `1080` | Height of the image. |
| `--steps` | `4` | Number of inference steps. 4-8 is recommended for Lightning. |
**Output Location**:
Images are saved automatically to the `output/` folder in this directory.
## ⚡ First Run Note
The first time you run the script, it will download:
1. **SDXL Base Model**: ~6GB (Cached in `hf_cache/`)
2. **Lightning UNet**: ~5GB (Saved in `models/`)
Use a fast internet connection! Subsequent runs will be instant.
## 🔧 Troubleshooting
- **"MPS backend out of memory"**:
This means your Mac ran out of GPU memory. The script includes `pipe.enable_model_cpu_offload()` to prevent this. try restarting your computer or closing other heavy apps.
- **"Permission denied" errors**:
You might see "mpsgraph" permission errors in the terminal. These are harmless warnings from macOS's Metal framework and can be ignored. The image generation will still work.
- **Slow First Generation**:
Shader compilation happens on the very first run. Future generations will be much faster.
## 🤝 Contributing
Feel free to open issues or submit PRs to improve performance or add features!

View File

@ -26,7 +26,9 @@ def generate_image(prompt, width=1920, height=1080, steps=4):
# Load UNet from local file
print("Loading UNet from local file...")
unet = UNet2DConditionModel.from_config(base, subfolder="unet").to(device, torch.float16)
# Fix FutureWarning: Load config first
unet_config = UNet2DConditionModel.load_config(base, subfolder="unet")
unet = UNet2DConditionModel.from_config(unet_config).to(device, torch.float16)
unet.load_state_dict(load_file(local_unet, device=device))
# Load Pipeline
@ -36,7 +38,8 @@ def generate_image(prompt, width=1920, height=1080, steps=4):
# Optimizations for Mac/MPS
print("Enabling attention slicing for memory efficiency...")
pipe.enable_attention_slicing()
# pipe.enable_model_cpu_offload() # Uncomment if running out of memory
print("Enabling model CPU offloading for memory efficiency...")
pipe.enable_model_cpu_offload()
# Ensure scheduler is correct for Lightning
pipe.scheduler = EulerDiscreteScheduler.from_config(pipe.scheduler.config, timestep_spacing="trailing")
@ -46,7 +49,8 @@ def generate_image(prompt, width=1920, height=1080, steps=4):
image = pipe(prompt, num_inference_steps=steps, guidance_scale=0, width=width, height=height).images[0]
# Save
save_dir = os.path.expanduser("~/Documents/Image Generations")
# Save
save_dir = os.path.join(os.path.dirname(os.path.abspath(__file__)), "output")
os.makedirs(save_dir, exist_ok=True)
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")