Today marks the launch of Stable Cascade in its research preview. This innovative text to image model introduces an interesting three-stage approach, setting new benchmarks for quality, flexibility, fine-tuning, and efficiency with a focus on further eliminating hardware barriers.
Looks interesting, but also somewhat complicated. Now generation has to run through 3 models? I thought it was bad enough having a separate refiner model for sdxl.
It’s based on Würstchen which has real savings on training, but the VRAM requirement is through the roof.
Sdxl was 3 models too. Base, refiner and vae.
Cascade is 3. Lores, hires, vae.