A comparative study of Flux and Stable Diffusion 1.5 LoRA training and image generation.
Context
Academic Project
Role
Trainer and Researcher
Year
2025
Industry
Artificial Intelligence
Inline Styled Line
Inline Styled Line
Introduction & Motivation
In the field of AI-generated imagery, small adjustments to base models can yield drastic changes in output quality and style. LoRA (Low-Rank Adaptation) is one such technique that allows for the fine-tuning of large models with minimal computational overhead. In this exploration, I showcase my skills in training two LoRA models:
These models were trained on a synthetic dataset of 10 images (each 2048×2048), focusing on an artistic concept of characters with half-white and half-black hair. The project’s primary goal: compare training configurations, performance, image quality, and VRAM usage.
Key Highlights
Dataset: 10 synthetic images, each depicting half-white, half-black hairstyles
Models: Flux vs. Stable Diffusion 1.5
Training: LoRA fine-tuning with prompts focusing on half-white/half-black hair
The dataset contains 10 high-resolution (2048×2048) images, each carefully curated to depict characters with distinct half-white, half-black hairstyles (color). This ensures the models learn to reconstruct this unique hair pattern consistently. (Data generated in Midjourney)
Why Half & Half Hair?
Visually distinct concept to test how quickly and effectively models adapt to a particular style.
Encourages the model to learn fine details (such as color partition) and handle unusual patterns.
Training prompts paired with these images consistently mention:
“…a person with half white hair and half black hair…” and often emphasize facial features and clothing details.
By maintaining consistency in the prompts, both models could be fairly compared on how they capture and recreate this distinctive style.
Inline Styled Line
Training with Flux LoRA
Overview of Flux
Flux is a specialized model that offers high-fidelity one-shot image generation. It can produce exquisitely detailed images but is more resource-intensive during both training and inference. Here, I used a 24GB Nvidia GPU to accommodate its higher VRAM demands.
an animated woman with half white hair and half black hair with blue eyes wearing a black dress against a brown background.
a digital painting of a woman with half white hair and half black hair and blue eyes wearing a black turtle neck sweater. Her face is animated, giving her a lifelike appearance.
a digital painting of a woman with half white hair and half black hair and blue eyes wearing a black turtle neck sweater. Her face is animated, giving her a lifelike appearance.
a digital painting of a woman with half white hair and half black hair and blue eyes wearing a black dress. Her face is animated, giving her a lifelike appearance.
a woman with half white hair and half black hair and blue eyes wearing a black turtle neck sweater. Her face is animated, giving her a lifelike appearance.
a digital painting of a man with half white hair and half black hair wearing a black turtle neck sweater against a dark background.
a digital painting of a man with half white hair and half black hair and blue eyes wearing a black dress against a dark background.
a painting of a man with half white hair and half black hair wearing a black turtle neck sweater against a dark background.
a painting of a man with half white hair and half black hair wearing a black turtle neck sweater against a dark background.
a man with half white hair and half black hair and blue eyes wearing a black turtle neck sweater against a dark background.
The prompts consistently emphasize the hair color, eyes, face being animated, and dark clothing.
Inline Styled Line
Performance Insights for Flux
Training Time: ~16 epochs on a 24GB GPU was relatively slow, owing to high-resolution data and resource-intensive architecture.
VRAM Usage: The training configuration with mixed precision (bf16) and gradient checkpointing helped reduce VRAM load, but still remained high compared to SD 1.5.
Inference Time: Flux LoRA yields excellent one-shot results—often not requiring additional post-processing—at the cost of slower generation speed.
Potential Use Case: If ultra-high fidelity in a single generation pass is critical (e.g., concept art, cinematic rendering), Flux LoRA is advantageous despite longer runtimes.
Stable Diffusion 1.5 is a widely adopted model known for a good balance of speed and image quality. Training LoRA on SD 1.5 is typically more accessible for users with moderate VRAM (like 8GB or 12GB GPUs), although the best results might still come from higher-memory GPUs.
Like Flux, the prompts were consistent and thematically matched: (Exact same prompts as Flux training)
an animated woman with half white hair and half black hair with blue eyes wearing a black dress against a brown background.
a digital painting of a woman with half white hair and half black hair and blue eyes wearing a black turtle neck sweater. Her face is animated, giving her a lifelike appearance.
a digital painting of a woman with half white hair and half black hair and blue eyes wearing a black turtle neck sweater. Her face is animated, giving her a lifelike appearance.
a digital painting of a woman with half white hair and half black hair and blue eyes wearing a black dress. Her face is animated, giving her a lifelike appearance.
a woman with half white hair and half black hair and blue eyes wearing a black turtle neck sweater. Her face is animated, giving her a lifelike appearance.
a digital painting of a man with half white hair and half black hair wearing a black turtle neck sweater against a dark background.
a digital painting of a man with half white hair and half black hair and blue eyes wearing a black dress against a dark background.
a painting of a man with half white hair and half black hair wearing a black turtle neck sweater against a dark background.
a painting of a man with half white hair and half black hair wearing a black turtle neck sweater against a dark background.
a man with half white hair and half black hair and blue eyes wearing a black turtle neck sweater against a dark background.
Such thematic consistency allows a straightforward comparison of final model outputs.
Inline Styled Line
Performance Insights for SD 1.5
Training Time: Faster than Flux, especially at 512×512 resolution.
VRAM Usage: Lower memory footprint. Mixed precision (fp16), the xformers optimization, and lower resolution training help it run smoothly on mid-range GPUs.
Inference Speed: Quicker generation times, but images might need upscaling or minimal post-processing to match the detail of Flux outputs.
Benefit: For most creative or iterative workflows, SD 1.5 LoRA is often more convenient—fast iteration, simpler setup, and good enough quality that can be improved with upscalers.
SD 1.5 Lora output with 2X upscale
Inline Styled Line
Comparative Analysis
Training Speed and VRAM
Aspect
Flux LoRA
SD 1.5 LoRA
Training Time (10–16 epochs)
Slower, more resource-intensive
Faster, lower resource usage
VRAM Requirement
High (24GB recommended)
Moderate (8–12GB feasible)
Precision
bf16 (gradient checkpointing)
fp16 (xformers, etc.)
Interpretation: While Flux can handle extremely high-fidelity outputs, it requires significantly more training time and GPU memory.
Inline Styled Line
Image Quality & Generation Speed
Flux:
Quality: High “one-shot” fidelity, minimal post-processing needed
Generation Time: Slower, especially at higher resolutions
SD 1.5:
Quality: Good, sometimes requires a bit of upscaling or inpainting for finer details
Generation Time: Faster, ideal for prototyping or iterative design
Flux LoRA(approx 10-12 min/epoch)
SD 1.5 LoRA(approx 5-6 min/epoch)
Inline Styled Line
Key Observations & Insights
Accessibility vs. Fidelity: Flux delivers top-notch fidelity but demands more VRAM and time. SD 1.5 is more accessible, trades a bit of quality for speed.
Single-Pass Quality: If minimal post-processing is required, Flux might be worth the extra resources. For hobbyists or small studios, SD 1.5 LoRA is typically easier to integrate.
Dataset & Prompt Consistency: Both models learned the half-black/half-white hair concept effectively, showcasing robust adaptability to niche styles.
Inline Styled Line
Conclusion
This project underscores the trade-offs between two potent LoRA training approaches:
Flux LoRA
Strengths: High-quality, visually stunning outputs in one pass
Limitations: Higher VRAM requirement, longer training and inference times
Limitations: May need additional upscaling or minor post-processing for best results
Both methods effectively produce dynamic images of characters with half-white, half-black hair. The choice hinges on resource availability and quality requirements. For enterprise-level or cinematic uses where fidelity is paramount, Flux is excellent. For broader accessibility and speed, SD 1.5 remains a go-to.
Inline Styled Line
Environment & Tools
Hardware:
Training primarily on a Nvidia 24GB GPU (recommended for Flux)
Additional tests on 8GB GPUs for SD 1.5 feasibility
Software:
Flux Gym
SD Scripts for 1.5
Stable Diffusion 1.5 base model from the “v1-5-pruned-emaonly.safetensors”
Mixed precision for VRAM efficiency (bf16 or fp16)
xformers optimization for SD 1.5
Note: Paths and references to local directories are for demonstration. They may be adjusted or replaced with cloud-based storage when sharing publicly.
Inline Styled Line
Final Remarks
This portfolio project highlights my expertise in:
AI Model Training: Tailoring hyperparameters for different base models (Flux and SD 1.5).
LoRA Fine-Tuning: Structuring a dataset and prompts to achieve consistent character styling.
Technical Documentation: Presenting configurations and results in a clear, compelling format.
With an ever-evolving landscape of deep generative models, understanding the nuances of multiple approaches is invaluable. Flux vs. SD 1.5 is a classic example of quality vs. accessibility—and mastering both workflows enables delivering solutions that best fit a project’s needs.
Inline Styled Line
References
Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., ... & Chen, W. (2021). Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.