AI/ML Documentation

Your comprehensive guide to AI image generation, custom model training, and automated creative workflows.

Your guide to creating AI-generated images, training custom models, and building automated workflows. From your first generated image to training your own artistic styles.

Why Learn AI Image Generation?
Choose Your Path
1. Documentation Overview
Getting Started
Key Concepts
Model Generations at a Glance
Real-World Applications
Best Practices
Troubleshooting
Automation and Integration
Resources and Community
Next Steps
Related Documentation

Why Learn AI Image Generation?

AI image generation has transformed from a research curiosity into a practical creative tool. Artists use it to explore new styles, designers prototype concepts in minutes instead of hours, and developers build automated content pipelines. The technology is accessible enough to run on consumer hardware, yet powerful enough for professional applications.

Consider the following before diving in:

What do you want to create? Photorealistic images, artistic illustrations, anime characters, or product mockups each benefit from different approaches
How much control do you need? Quick generation versus precise artistic direction require different tools and workflows
Will you need custom styles or subjects? Training your own models unlocks personalized results that generic models cannot achieve

This documentation covers the practical skills you need, from understanding how the technology works to building production-ready workflows.

Choose Your Path

Different goals require different starting points. Find your path below:

Your Goal	Start Here	Then Explore
Generate images quickly	ComfyUI Guide	Base Models Comparison
Understand the technology	Stable Diffusion Fundamentals	Model Types
Train custom styles	LoRA Training	Advanced Techniques
Control composition precisely	ControlNet	ComfyUI Guide
Build automation pipelines	ComfyUI Guide	Output Formats

Documentation Overview

Understanding the Foundations

Stable Diffusion Fundamentals - How diffusion models transform noise into images
Model Types - The building blocks: LoRAs, VAEs, CLIP, and how they work together
Base Models Comparison - SD 1.5 vs SDXL vs FLUX: choosing the right foundation

Practical Tools

ComfyUI Guide - Visual workflow builder for complex generation pipelines
LoRA Training - Create custom models for specific styles, characters, or concepts
ControlNet - Guide generation with poses, edges, depth maps, and more

Going Further

Output Formats - Working with generated content across image, video, and 3D
Advanced Techniques - Expert workflows and optimization strategies

Getting Started

Before generating your first image, you will need compatible hardware and a few software tools. The requirements scale with the complexity of your goals.

Hardware Requirements

Use Case	GPU VRAM	System RAM	Storage
Basic generation (SD 1.5)	4-6 GB	16 GB	50 GB
Standard workflows (SDXL)	8-12 GB	32 GB	200 GB
Advanced models (FLUX, SD3)	16-24 GB	64 GB	500 GB
LoRA training	8-24 GB	32-64 GB	100 GB

Most modern NVIDIA GPUs work well. AMD and Apple Silicon have growing support but may require additional configuration.

Software Requirements

Docker and Docker Compose - Containers simplify setup and ensure consistent environments
NVIDIA Docker runtime - Enables GPU acceleration inside containers
Python 3.10+ - Only needed for local scripts or custom development

What You Should Already Know

No prior AI experience is required, but you should be comfortable with:

Running commands in a terminal
Basic file and folder operations
Reading error messages and troubleshooting

Quick Start: Your First Image

The fastest way to generate an image is through ComfyUI’s web interface:

# Start ComfyUI and open http://localhost:8188
docker-compose up -d comfyui-server

Once the interface loads, you can use the default workflow immediately. Type your prompt, click “Queue Prompt,” and watch your image generate.

For programmatic access or automation, the MCP API accepts JSON requests:

curl -X POST http://localhost:8005/mcp/tool \
  -H "Content-Type: application/json" \
  -d '{"tool": "generate-image", "arguments": {"prompt": "mountain landscape at sunset"}}'

See the ComfyUI Guide for detailed setup and workflow tutorials.

Key Concepts

Understanding a few core ideas will help you make better decisions about models, settings, and workflows.

How Diffusion Models Create Images

Diffusion models learn by studying how images gradually dissolve into random noise, then learning to reverse that process. When you generate an image, the model starts with pure noise and progressively refines it into a coherent picture, guided by your text prompt.

This happens in “latent space” (a compressed mathematical representation) rather than pixel-by-pixel, which is why modern models can run on consumer hardware. Each generation step removes a bit of noise while steering toward your described content.

Generation Approach	Steps Needed	Best For
Standard diffusion	20-50	High quality, most control
LCM (Latent Consistency)	4-8	Fast iteration, previews
Turbo models	1-4	Real-time, interactive use

The Model Stack

AI image generation uses several specialized components working together:

Base Model - The foundation that understands image-text relationships (SD 1.5, SDXL, FLUX)
VAE - Compresses images for efficient processing, then decompresses the result
Text Encoder - Translates your prompt into numbers the model understands
LoRA - Small add-ons that teach the base model new styles or subjects
ControlNet - Guides composition using reference images, poses, or edges

Think of the base model as a skilled artist, LoRAs as specialized training, and ControlNet as a reference sketch the artist follows.

Choosing a Workflow Tool

Several interfaces exist for working with these models:

Tool	Best For	Learning Curve
ComfyUI	Complex workflows, automation, experimentation	Moderate
Automatic1111/Forge	Feature-rich UI, extensions ecosystem	Low
Fooocus	Simple generation, beginners	Very low
InvokeAI	Professional canvas-based editing	Low-moderate

This documentation focuses on ComfyUI because its node-based approach teaches you how the components connect and enables the most advanced workflows.

Model Generations at a Glance

The field evolves quickly. Here is how the major model families compare:

Model	Resolution	VRAM Needed	Strengths	Best For
SD 1.5	512x512	4-6 GB	Huge LoRA ecosystem, fast	Beginners, resource-limited setups
SDXL	1024x1024	8-12 GB	Quality, composition	General creative work
SD3	1024x1024	10-16 GB	Text rendering, prompt following	Text-heavy images, precision
FLUX	1024x1024+	12-24 GB	Photorealism, coherence	Professional quality, portraits

When to use which:

Start with SDXL for the best balance of quality, speed, and ecosystem support
Use SD 1.5 if you have limited hardware or need specific legacy LoRAs
Choose FLUX when photorealism and fine details matter most
Pick SD3 when your images include text or need precise prompt interpretation

See Base Models Comparison for detailed technical differences.

Real-World Applications

People use AI image generation across many fields. Here are common scenarios and the approaches that work best:

Creative Work

Concept art and illustration - Generate variations quickly, then refine favorites manually. Use style LoRAs to maintain visual consistency across a project.

Character design - Train a character LoRA from reference sketches, then generate the character in different poses and situations. Combine with ControlNet for precise posing.

Environment art - Generate base landscapes or interiors, use img2img for iterative refinement. ControlNet depth maps help maintain architectural consistency.

Commercial Applications

Product visualization - Generate product mockups in various settings before physical prototypes exist. Works especially well for packaging and marketing concepts.

Marketing content - Create social media visuals, banner images, and promotional materials. Train brand-specific LoRAs for consistent visual identity.

Game development - Generate texture variations, background elements, and concept references. LoRAs trained on existing game art maintain style consistency.

Technical Applications

Dataset augmentation - Generate training data variations for other ML models. Particularly valuable when real data is scarce or expensive to collect.

Rapid prototyping - Visualize ideas before committing development resources. Especially useful in early design phases.

Best Practices

Writing Effective Prompts

Good prompts guide the model toward your vision. Structure them with the most important elements first:

Subject first - “A knight in armor” beats “detailed, 4k, masterpiece, knight”
Be specific - “Golden retriever puppy” produces better results than “dog”
Include context - Mention lighting, setting, camera angle, and artistic style
Use negative prompts - Tell the model what to avoid (blurry, low quality, extra limbs)

Managing System Resources

AI models consume significant GPU memory. These practices help:

Match model size to your hardware (see requirements table above)
Enable “low VRAM” modes in your workflow tool when needed
Close other GPU-intensive applications during generation
Use quantized model versions (fp16, fp8) for reduced memory usage

Improving Output Quality

When results disappoint, try these adjustments:

Problem	Solution
Blurry images	Increase steps (30-50), try different sampler
Wrong composition	Revise prompt structure, consider ControlNet
Artifacts/glitches	Lower CFG scale, check model compatibility
Style not matching	Adjust LoRA strength, verify trigger words

Troubleshooting

When something goes wrong, these are the most common causes and fixes:

Out of Memory Errors

Your GPU ran out of VRAM. Try these solutions in order:

Use a smaller model version (fp16 instead of fp32, fp8 for FLUX)
Reduce image resolution
Enable “low VRAM” or “CPU offloading” in your workflow tool
Close other applications using the GPU

Slow Generation

Generation taking too long usually means inefficient settings:

Reduce sampling steps (20-30 is often sufficient)
Switch to a faster sampler (DPM++ 2M, Euler)
Verify GPU is being used (check nvidia-smi)
Ensure models are loaded once, not reloaded per image

Poor Quality Results

When images do not match expectations:

Review and refine your prompt (be more specific)
Experiment with CFG scale (try 5-10 range)
Increase sampling steps for more detail
Verify model and LoRA compatibility

Automation and Integration

For batch processing or integration into larger systems, the MCP API provides programmatic access:

import requests

# Generate an image via API
response = requests.post("http://localhost:8189/mcp/tool", json={
    "tool": "generate-image",
    "arguments": {
        "prompt": "cyberpunk city at night",
        "checkpoint": "sdxl_base.safetensors"
    }
})

See the ComfyUI Guide for complete API documentation and workflow submission examples.

Resources and Community

Where to Find Models

CivitAI - Largest collection of LoRAs, checkpoints, and community models
Hugging Face - Official model releases and research models

Learning and Help

Reddit r/StableDiffusion - Active community discussions
ComfyUI GitHub - Official documentation and issues
Stable Diffusion Discord - Real-time community help

Research Papers

For those interested in the underlying technology:

Stable Diffusion Paper - Original architecture
Stable Diffusion 3 Paper - Latest architecture advances

Next Steps

Based on your goals, here is where to go next:

Want to generate images now? Start with the ComfyUI Guide for hands-on workflow building.

Want to understand the technology? Read Stable Diffusion Fundamentals for the concepts behind generation.

Want to create custom styles? Jump to LoRA Training to learn how to train your own models.

For broader AI and machine learning concepts:

AI Fundamentals - Simplified - Conceptual introduction without heavy math
AI Fundamentals - Complete - Technical deep-dive into AI concepts
AI Documentation Hub - All AI-related documentation

Hardware Note: This documentation assumes NVIDIA GPU access. AMD and Apple Silicon support is improving but may require additional configuration and have limited feature availability.