Back to Homepage

AI Half & Half: Flux vs SD 1.5

A comparative study of Flux and Stable Diffusion 1.5 LoRA training and image generation.
Inline Styled Line
Inline Styled Line

Introduction & Motivation

In the field of AI-generated imagery, small adjustments to base models can yield drastic changes in output quality and style. LoRA (Low-Rank Adaptation) is one such technique that allows for the fine-tuning of large models with minimal computational overhead. In this exploration, I showcase my skills in training two LoRA models:

  1. Flux-based LoRA (dev base model)
  2. Stable Diffusion 1.5 LoRA (popular open-source model)

These models were trained on a synthetic dataset of 10 images (each 2048×2048), focusing on an artistic concept of characters with half-white and half-black hair. The project’s primary goal: compare training configurations, performance, image quality, and VRAM usage.

Key Highlights

  • Dataset: 10 synthetic images, each depicting half-white, half-black hairstyles
  • Models: Flux vs. Stable Diffusion 1.5
  • Training: LoRA fine-tuning with prompts focusing on half-white/half-black hair
  • Comparison: Speed, VRAM usage, feature reproducability
Inline Styled Line

Dataset & Concept

The dataset contains 10 high-resolution (2048×2048) images, each carefully curated to depict characters with distinct half-white, half-black hairstyles (color). This ensures the models learn to reconstruct this unique hair pattern consistently. (Data generated in Midjourney)

  • Why Half & Half Hair?
    • Visually distinct concept to test how quickly and effectively models adapt to a particular style.
    • Encourages the model to learn fine details (such as color partition) and handle unusual patterns.

Training prompts paired with these images consistently mention:

“…a person with half white hair and half black hair…”
and often emphasize facial features and clothing details.

By maintaining consistency in the prompts, both models could be fairly compared on how they capture and recreate this distinctive style.

Inline Styled Line

Training with Flux LoRA

Overview of Flux

Flux is a specialized model that offers high-fidelity one-shot image generation. It can produce exquisitely detailed images but is more resource-intensive during both training and inference. Here, I used a 24GB Nvidia GPU to accommodate its higher VRAM demands.

Flux LoRA Command & Config

accelerate launch \
  --mixed_precision bf16 \
  --num_cpu_threads_per_process 1 \
  sd-scripts/flux_train_network.py \
  --pretrained_model_name_or_path "/data/app/models/unet/flux1-dev.sft" \
  --clip_l "/data/app/models/clip/clip_l.safetensors" \
  --t5xxl "/data/app/models/clip/t5xxl_fp16.safetensors" \
  --ae "/data/app/models/vae/ae.sft" \
  --cache_latents_to_disk \
  --save_model_as safetensors \
  --sdpa --persistent_data_loader_workers \
  --max_data_loader_n_workers 2 \
  --seed 42 \
  --gradient_checkpointing \
  --mixed_precision bf16 \
  --save_precision bf16 \
  --network_module networks.lora_flux \
  --network_dim 4 \
  --optimizer_type adamw8bit \
  --sample_prompts="/data/app/outputs/whtblckhar10/sample_prompts.txt" \
  --sample_every_n_steps="160" \
  --learning_rate 8e-4 \
  --cache_text_encoder_outputs \
  --cache_text_encoder_outputs_to_disk \
  --fp8_base \
  --highvram \
  --max_train_epochs 16 \
  --save_every_n_epochs 4 \
  --dataset_config "/data/app/outputs/whtblckhar10/dataset.toml" \
  --output_dir "/data/app/outputs/whtblckhar10" \
  --output_name whtblckhar10 \
  --timestep_sampling shift \
  --discrete_flow_shift 3.1582 \
  --model_prediction_type raw \
  --guidance_scale 1 \
  --loss_type l2

Dataset Configuration (dataset.toml)

[general]
shuffle_caption = false
caption_extension = '.txt'
keep_tokens = 1

[[datasets]]
resolution = 512
batch_size = 1
keep_tokens = 1

  [[datasets.subsets]]
  image_dir = '/data/app/datasets/whtblckhar10'
  class_tokens = 'whtblckhar10'
  num_repeats = 10
Inline Styled Line

Flux LoRA Training Prompts

  1. an animated woman with half white hair and half black hair with blue eyes wearing a black dress against a brown background.
  2. a digital painting of a woman with half white hair and half black hair  and blue eyes wearing a black turtle neck sweater. Her face is animated, giving her a lifelike appearance.
  3. a digital painting of a woman with half white hair and half black hair and blue eyes wearing a black turtle neck sweater. Her face is animated, giving her a lifelike appearance.
  4. a digital painting of a woman with half white hair and half black hair and blue eyes wearing a black dress. Her face is animated, giving her a lifelike appearance.
  5. a woman with half white hair and half black hair and blue eyes wearing a black turtle neck sweater. Her face is animated, giving her a lifelike appearance.
  6. a digital painting of a man with half white hair and half black hair wearing a black turtle neck sweater against a dark background.
  7. a digital painting of a man with half white hair and half black hair and blue eyes wearing a black dress against a dark background.
  8. a painting of a man with half white hair and half black hair wearing a black turtle neck sweater against a dark background.
  9. a painting of a man with half white hair and half black hair wearing a black turtle neck sweater against a dark background.
  10. a man with half white hair and half black hair and blue eyes wearing a black turtle neck sweater against a dark background.

The prompts consistently emphasize the hair color, eyes, face being animated, and dark clothing.

Inline Styled Line

Performance Insights for Flux

  • Training Time: ~16 epochs on a 24GB GPU was relatively slow, owing to high-resolution data and resource-intensive architecture.
  • VRAM Usage: The training configuration with mixed precision (bf16) and gradient checkpointing helped reduce VRAM load, but still remained high compared to SD 1.5.
  • Inference Time: Flux LoRA yields excellent one-shot results—often not requiring additional post-processing—at the cost of slower generation speed.

Potential Use Case: If ultra-high fidelity in a single generation pass is critical (e.g., concept art, cinematic rendering), Flux LoRA is advantageous despite longer runtimes.

Flux Lora output

Model Download url: White X Black Hair. Flux LoRA - v1.0 | Flux LoRA | Civitai

Inline Styled Line

Training with Stable Diffusion 1.5 LoRA

Overview of SD 1.5

Stable Diffusion 1.5 is a widely adopted model known for a good balance of speed and image quality. Training LoRA on SD 1.5 is typically more accessible for users with moderate VRAM (like 8GB or 12GB GPUs), although the best results might still come from higher-memory GPUs.

SD 1.5 LoRA Config

pretrained_model_name_or_path = "D:/Forge/webui/models/Stable-diffusion/v1-5-pruned-emaonly.safetensors"
train_data_dir = "E:/OneDrive/Desktop/TD"
resolution = "512,512"
enable_bucket = true
min_bucket_reso = 256
max_bucket_reso = 1_024
output_name = "blkwht10"
output_dir = "./output"
save_model_as = "safetensors"
save_every_n_epochs = 2
max_train_epochs = 10
train_batch_size = 1
network_train_unet_only = false
network_train_text_encoder_only = false
learning_rate = 0.0001
unet_lr = 0.0001
text_encoder_lr = 0.00001
lr_scheduler = "cosine_with_restarts"
optimizer_type = "AdamW8bit"
lr_scheduler_num_cycles = 1
network_module = "networks.lora"
network_dim = 32
network_alpha = 32
logging_dir = "./logs"
caption_extension = ".txt"
shuffle_caption = true
keep_tokens = 0
max_token_length = 255
seed = 1337
prior_loss_weight = 1
clip_skip = 2
mixed_precision = "fp16"
save_precision = "fp16"
xformers = true
cache_latents = true
persistent_data_loader_workers = true
lr_warmup_steps = 0
sample_prompts = "(masterpiece, best quality:1.2), 1girl, solo, woman with half white hair..."
sample_sampler = "euler_a"
sample_every_n_epochs = 2
gpu_ids = [ "0" ]
Inline Styled Line

SD 1.5 LoRA Training Prompts

Like Flux, the prompts were consistent and thematically matched: (Exact same prompts as Flux training)

  1. an animated woman with half white hair and half black hair with blue eyes wearing a black dress against a brown background.
  2. a digital painting of a woman with half white hair and half black hair  and blue eyes wearing a black turtle neck sweater. Her face is animated, giving her a lifelike appearance.
  3. a digital painting of a woman with half white hair and half black hair and blue eyes wearing a black turtle neck sweater. Her face is animated, giving her a lifelike appearance.
  4. a digital painting of a woman with half white hair and half black hair and blue eyes wearing a black dress. Her face is animated, giving her a lifelike appearance.
  5. a woman with half white hair and half black hair and blue eyes wearing a black turtle neck sweater. Her face is animated, giving her a lifelike appearance.
  6. a digital painting of a man with half white hair and half black hair wearing a black turtle neck sweater against a dark background.
  7. a digital painting of a man with half white hair and half black hair and blue eyes wearing a black dress against a dark background.
  8. a painting of a man with half white hair and half black hair wearing a black turtle neck sweater against a dark background.
  9. a painting of a man with half white hair and half black hair wearing a black turtle neck sweater against a dark background.
  10. a man with half white hair and half black hair and blue eyes wearing a black turtle neck sweater against a dark background.

Such thematic consistency allows a straightforward comparison of final model outputs.

Inline Styled Line

Performance Insights for SD 1.5

  • Training Time: Faster than Flux, especially at 512×512 resolution.
  • VRAM Usage: Lower memory footprint. Mixed precision (fp16), the xformers optimization, and lower resolution training help it run smoothly on mid-range GPUs.
  • Inference Speed: Quicker generation times, but images might need upscaling or minimal post-processing to match the detail of Flux outputs.

Benefit: For most creative or iterative workflows, SD 1.5 LoRA is often more convenient—fast iteration, simpler setup, and good enough quality that can be improved with upscalers.

SD 1.5 Lora output with 2X upscale
Inline Styled Line

Comparative Analysis

Training Speed and VRAM

Aspect Flux LoRA SD 1.5 LoRA
Training Time (10–16 epochs) Slower, more resource-intensive Faster, lower resource usage
VRAM Requirement High (24GB recommended) Moderate (8–12GB feasible)
Precision bf16 (gradient checkpointing) fp16 (xformers, etc.)

Interpretation: While Flux can handle extremely high-fidelity outputs, it requires significantly more training time and GPU memory.

Inline Styled Line

Image Quality & Generation Speed

  • Flux:
    • Quality: High “one-shot” fidelity, minimal post-processing needed
    • Generation Time: Slower, especially at higher resolutions
  • SD 1.5:
    • Quality: Good, sometimes requires a bit of upscaling or inpainting for finer details
    • Generation Time: Faster, ideal for prototyping or iterative design
Flux LoRA
(approx 10-12 min/epoch)
SD 1.5 LoRA
(approx 5-6 min/epoch)
Inline Styled Line

Key Observations & Insights

  • Accessibility vs. Fidelity:
    Flux delivers top-notch fidelity but demands more VRAM and time. SD 1.5 is more accessible, trades a bit of quality for speed.
  • Single-Pass Quality:
    If minimal post-processing is required, Flux might be worth the extra resources. For hobbyists or small studios, SD 1.5 LoRA is typically easier to integrate.
  • Dataset & Prompt Consistency:
    Both models learned the half-black/half-white hair concept effectively, showcasing robust adaptability to niche styles.
Inline Styled Line

Conclusion

This project underscores the trade-offs between two potent LoRA training approaches:

  1. Flux LoRA
    • Strengths: High-quality, visually stunning outputs in one pass
    • Limitations: Higher VRAM requirement, longer training and inference times
  2. SD 1.5 LoRA
    • Strengths: Faster training, lower hardware demands, widely compatible
    • Limitations: May need additional upscaling or minor post-processing for best results

Both methods effectively produce dynamic images of characters with half-white, half-black hair. The choice hinges on resource availability and quality requirements. For enterprise-level or cinematic uses where fidelity is paramount, Flux is excellent. For broader accessibility and speed, SD 1.5 remains a go-to.

Inline Styled Line

Environment & Tools

  • Hardware:
    • Training primarily on a Nvidia 24GB GPU (recommended for Flux)
    • Additional tests on 8GB GPUs for SD 1.5 feasibility
  • Software:
    • Flux Gym
    • SD Scripts for 1.5
    • Stable Diffusion 1.5 base model from the “v1-5-pruned-emaonly.safetensors”
    • Mixed precision for VRAM efficiency (bf16 or fp16)
    • xformers optimization for SD 1.5

Note: Paths and references to local directories are for demonstration. They may be adjusted or replaced with cloud-based storage when sharing publicly.

Inline Styled Line

Final Remarks

This portfolio project highlights my expertise in:

  • AI Model Training: Tailoring hyperparameters for different base models (Flux and SD 1.5).
  • LoRA Fine-Tuning: Structuring a dataset and prompts to achieve consistent character styling.
  • Comparative Analysis: Evaluating runtime performance, VRAM usage, and output quality objectively.
  • Technical Documentation: Presenting configurations and results in a clear, compelling format.

With an ever-evolving landscape of deep generative models, understanding the nuances of multiple approaches is invaluable. Flux vs. SD 1.5 is a classic example of quality vs. accessibility—and mastering both workflows enables delivering solutions that best fit a project’s needs.

Inline Styled Line

References

Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., ... & Chen, W. (2021). Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.

Previous project
Back to all projects