ControlNet Guide
Master ControlNet for precise control over AI image generation using poses, edges, depth, and more.
What is ControlNet?
ControlNet is a neural network architecture that adds spatial control to diffusion models. It allows you to guide image generation using various types of conditioning inputs like human poses, edge maps, depth maps, and more, while maintaining the quality and capabilities of the base model.
As of 2024, ControlNet has evolved significantly with new control types, better preprocessing, and support for newer models. The ecosystem now includes alternatives like T2I-Adapter, IP-Adapter, and InstantID that offer different trade-offs between control precision and flexibility.
How ControlNet Works
Input Image → Preprocessor → Control Map
↓
Text Prompt → Base Model + ControlNet → Controlled Output
ControlNet creates a trainable copy of the diffusion model’s encoder blocks, which learns to respond to specific spatial conditions while preserving the original model’s generation capabilities.
ControlNet Types
Pose Control
OpenPose
Detects human body keypoints and skeleton structure.
Purpose: Human pose transfer
Keypoints: 18-25 body points
Includes: Body, hands, face
Best for: Character consistency, pose reference
Preprocessor Options:
openpose_full: Body + hands + faceopenpose_body: Body onlyopenpose_hand: Hands focusopenpose_face: Face landmarks
Example Workflow:
[Reference Image] → [OpenPose Preprocessor] → [Pose Skeleton]
↓
"A warrior in armor" → [ControlNet OpenPose] → [Posed Character]
DWPose
More accurate pose estimation with better occlusion handling.
Advantages: Better accuracy, stable tracking
Keypoints: More detailed skeleton
Performance: Slower but more reliable
Use case: Complex poses, partial visibility
New: Whole-body estimation including hands/face
RTMPose
Real-time pose estimation with good accuracy.
Speed: Fastest option available
Accuracy: Good balance
Use case: Real-time applications
Platforms: Mobile-friendly
Edge Detection
Canny Edge
Classic edge detection algorithm for clean line extraction.
Purpose: Preserve shapes and outlines
Parameters: Low/High threshold
Output: Binary edge map
Best for: Architecture, objects, clean lines
Parameter Guide:
{
"low_threshold": 100, # Lower = more edges
"high_threshold": 200, # Higher = fewer edges
}
MLSD (M-LSD)
Detects straight lines and geometric structures.
Purpose: Architectural elements
Specialty: Straight line detection
Best for: Buildings, interiors, technical drawings
SoftEdge (HED/PIDI)
Preserves more subtle edge information.
Methods: HED, PIDI, PidiNet
Purpose: Artistic edge preservation
Quality: Softer, more natural edges
Best for: Organic subjects, artistic styles
Depth Control
MiDaS
Monocular depth estimation for general scenes.
Versions: MiDaS v2.1, v3.0
Resolution: Multiple model sizes
Output: Relative depth map
Best for: General depth control
Zoe Depth
More accurate depth estimation with metric depth.
Accuracy: Superior to MiDaS
Type: Metric depth (actual distances)
Training: NYU Depth v2, KITTI
Best for: Realistic depth, outdoor scenes
LeReS
Learning to Recover 3D Scene Shape.
Quality: High quality depth
Features: Handles complex scenes
Speed: Slower than MiDaS
Best for: Complex compositions
Semantic Control
Segmentation
Uses semantic segmentation maps for region-based control.
Models: ADE20K, COCO, custom
Classes: 150+ object categories
Control: Per-region styling
Best for: Scene composition
Color Mapping Example:
segmentation_colors = {
"sky": [134, 193, 249],
"building": [128, 128, 128],
"tree": [0, 128, 0],
"person": [255, 0, 0],
"ground": [139, 69, 19]
}
Normal Maps
Surface normal information for 3D-aware generation.
Purpose: 3D surface orientation
Format: RGB encoded normals
Use case: 3D consistency, lighting
Best for: Products, sculptures
Line Art Control
Anime Line Art
Extracts clean lines suitable for anime/manga style.
Purpose: Anime/manga line extraction
Cleanliness: Very clean lines
Style: Manga-appropriate
Best for: Anime characters, manga
Scribble
Converts rough sketches to control inputs.
Modes: Scribble, Fake Scribble
Input: Hand-drawn sketches
Tolerance: High noise tolerance
Best for: Quick ideation
Special Controls
Shuffle
Rearranges image content while preserving style.
Purpose: Style transfer with layout change
Method: Spatial shuffling
Randomness: Controllable
Best for: Creative variations
Tile
Enables tiled/seamless generation and upscaling.
Purpose: Seamless textures, upscaling
Method: Overlapping tiles
Quality: Maintains consistency
Best for: Patterns, super-resolution
Inpaint
Specialized control for masked region generation.
Input: Image + Mask
Control: Only masked areas
Blending: Seamless integration
Best for: Object removal, editing
Installation and Setup
ComfyUI Installation
# Install ControlNet models
cd ComfyUI/models/controlnet
# Download models (example for SD 1.5)
wget https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_openpose.pth
wget https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11f1p_sd15_depth.pth
# For SDXL ControlNet
wget https://huggingface.co/diffusers/controlnet-canny-sdxl-1.0/resolve/main/diffusion_pytorch_model.fp16.safetensors
# Install preprocessor nodes
cd ../../custom_nodes
git clone https://github.com/Fannovel16/comfyui_controlnet_aux.git
Required Components
Core:
- ComfyUI-ControlNet-Aux (preprocessors)
- ControlNet models (specific to base model)
- Base diffusion model (SD 1.5, SDXL, etc.)
Optional:
- Custom preprocessors
- Additional ControlNet models
Basic Workflows
Simple ControlNet Workflow
# ComfyUI nodes
[Load Image] → [ControlNet Preprocessor] → Control Image
↓
[Load ControlNet] → [Apply ControlNet] ← [Positive Prompt]
↓
[KSampler] → [VAE Decode] → [Save]
Multi-ControlNet Setup
# Stack multiple ControlNets
[OpenPose Control] → [Apply ControlNet 1]
↓
[Depth Control] → [Apply ControlNet 2]
↓
[KSampler]
ControlNet Parameters
{
"strength": 1.0, # 0-2, control influence
"start_percent": 0.0, # When to start applying
"end_percent": 1.0, # When to stop applying
"control_mode": "balanced", # balanced/prompt/control
}
Advanced Techniques
Strength Scheduling
Vary ControlNet influence during generation:
# Reduce control over time for more creativity
strength_schedule = {
0: 1.0, # Full control at start
0.5: 0.7, # Reduce midway
0.8: 0.3, # Minimal at end
}
# Advanced scheduling with curves
def cosine_schedule(step, total_steps):
progress = step / total_steps
return 0.5 * (1 + math.cos(math.pi * progress))
Multiple Control Combinations
Pose + Depth
# Character in specific pose with depth
controls = [
{"type": "openpose", "strength": 1.0},
{"type": "depth", "strength": 0.5}
]
Edge + Segmentation
# Precise shapes with semantic regions
controls = [
{"type": "canny", "strength": 0.8},
{"type": "segmentation", "strength": 0.6}
]
Control Mode Selection
| Mode | Description | Use Case |
|---|---|---|
| Balanced | Equal weight to prompt and control | General use |
| Prompt | Prioritize text prompt | Creative freedom |
| Control | Prioritize ControlNet | Exact matching |
Resolution Considerations
# ControlNet resolution tips
if base_model == "SD1.5":
control_res = 512
elif base_model == "SDXL":
control_res = 1024
# Preprocess to match
control_image = resize_image(input_image, control_res)
Preprocessing Best Practices
Image Preparation
def prepare_control_image(image, control_type):
# Ensure correct resolution
image = resize_to_model_resolution(image)
# Enhance contrast for edge detection
if control_type in ["canny", "mlsd"]:
image = enhance_contrast(image)
# Denoise for cleaner extraction
if control_type == "openpose":
image = denoise(image)
return image
Preprocessor Selection
| Input Quality | Recommended Preprocessor |
|---|---|
| Clean photo | Standard preprocessor |
| Noisy image | Robust variants (DW, LeReS+) |
| Artistic | Soft variants (HED, PIDI) |
| Technical | Precise variants (MLSD) |
Custom Preprocessing
# Create custom control maps
import cv2
import numpy as np
def custom_edge_detection(image):
# Combine multiple edge detectors
canny = cv2.Canny(image, 50, 150)
laplacian = cv2.Laplacian(image, cv2.CV_64F)
# Weighted combination
combined = 0.7 * canny + 0.3 * np.abs(laplacian)
return combined.astype(np.uint8)
Model Compatibility
Recent Developments
New Control Types
- QR Code Control: Generate scannable QR codes in artistic styles
- Illumination Control: Precise lighting direction control
- Recolor: Change colors while preserving structure
- Blur Control: Depth-of-field and focus control
Model Support
- SDXL ControlNet: Full support with higher quality
- SD3 ControlNet: In development
- FLUX Support: Coming soon with new architecture
ControlNet Versions
| Base Model | ControlNet Version | File Pattern |
|---|---|---|
| SD 1.5 | v1.1 | control_v11_sd15_.pth |
| SD 2.1 | v1.1 SD2 | control_v11_sd21_.pth |
| SDXL | SDXL v1 | controlnet--sdxl-1.0.safetensors |
| SD3 | In Development | controlnet--sd3-.safetensors |
| FLUX | Coming Soon | TBD |
T2I-Adapter
Alternative to ControlNet with different characteristics:
Advantages:
- Smaller model size (~80MB vs ~1.4GB)
- Faster inference
- Multiple adapters combinable
- Lower VRAM usage
- Better for real-time applications
Disadvantages:
- Sometimes less precise
- Fewer available types
- May require more prompt engineering
IP-Adapter Integration
Combine ControlNet with IP-Adapter:
[ControlNet Depth] + [IP-Adapter Style] = Precise structure with reference style
Common Workflows
Character Consistency
# Maintain character across poses
workflow = {
"reference_image": "character_reference.png",
"preprocessor": "openpose_full",
"prompt_template": "character_name, {pose_description}",
"strength": 0.9,
"seed": "fixed_for_consistency"
}
Architecture Visualization
# Technical drawing to render
workflow = {
"line_drawing": "floor_plan.png",
"preprocessor": "mlsd",
"prompt": "modern house interior, photorealistic",
"control_strength": 1.0,
"cfg_scale": 7.5
}
Style Transfer with Structure
# Preserve composition, change style
workflow = {
"content_image": "photo.jpg",
"preprocessors": ["depth", "softedge"],
"style_prompt": "oil painting in the style of van gogh",
"control_balance": {
"depth": 0.7,
"softedge": 0.5
}
}
Troubleshooting
Common Issues
Preprocessor Not Detecting Features
# Solutions
- Increase image contrast
- Try different preprocessor variant
- Adjust detection thresholds
- Use manual annotation tools
Over-controlling Generation
# Reduce control influence
{
"strength": 0.6, # Lower from 1.0
"end_percent": 0.8, # Stop control early
"cfg_scale": 9, # Increase prompt importance
}
Artifacts at Edges
# Edge artifact mitigation
- Use softedge instead of canny
- Blur control map slightly
- Reduce control strength at boundaries
- Enable "soft" control mode
Performance Optimization
# Memory-efficient ControlNet
{
"low_vram_mode": true,
"preprocessor_device": "cpu",
"control_net_device": "cuda",
"offload_when_unused": true
}
Creative Applications
Hybrid Controls
# Combine photo and sketch
photo_depth = extract_depth(photo)
sketch_lines = process_sketch(drawing)
combined_control = blend_controls(photo_depth, sketch_lines, alpha=0.5)
Temporal Consistency
For animations:
# Frame-to-frame consistency
previous_control = None
for frame in video_frames:
current_control = extract_pose(frame)
if previous_control:
# Smooth between frames
current_control = interpolate_controls(
previous_control, current_control, 0.3
)
generate_frame(current_control)
previous_control = current_control
Interactive Control
# Real-time adjustment
class InteractiveControl:
def update_strength(self, value):
self.control_strength = value
self.regenerate()
def switch_preprocessor(self, new_type):
self.control_map = preprocess(self.input, new_type)
self.regenerate()
Best Practices
Do’s
✓ Match control resolution to model resolution
✓ Use appropriate preprocessor for input type
✓ Experiment with strength values
✓ Combine multiple controls thoughtfully
✓ Save successful control maps for reuse
Don’ts
✗ Don’t use 100% control strength always
✗ Don’t ignore prompt importance
✗ Don’t use incompatible model versions
✗ Don’t expect perfect results immediately
✗ Don’t overstack controls (3+ rarely helpful)
Future Developments
Emerging ControlNet Technologies
- 3D-Aware Control: Full 3D scene understanding with depth and normals
- Video ControlNet: Temporal consistency across frames
- Semantic Editing: Natural language region control
- Adaptive Control: AI-driven strength adjustment
- Neural Controls: Learned control patterns from examples
- Diffusion Illusions: Optical illusion generation
- Multi-Subject Control: Individual control over multiple subjects
- Mesh Control: Direct 3D mesh conditioning
Integration Trends
- Real-time Preview: See control effects instantly
- Mobile Optimization: On-device control processing
- Cloud Preprocessing: Offload heavy computation
- AI Control Generation: Generate control maps from prompts
- Cross-Model Support: Universal control formats
- Community Controls: User-created control types
Conclusion
ControlNet transforms diffusion models from probabilistic generators into precision tools. By understanding the various control types and their optimal applications, you can achieve unprecedented control over AI image generation while maintaining the creative capabilities of the base models.
The key to mastery is experimentation: try different preprocessors, adjust strengths, and combine controls creatively. As the technology evolves, ControlNet continues to bridge the gap between artistic vision and AI capabilities.
See Also
- Stable Diffusion Fundamentals - Understanding the base generation process
- ComfyUI Guide - Integrate ControlNet into advanced workflows
- LoRA Training - Combine LoRAs with ControlNet for custom styles
- Advanced Techniques - Multi-control and expert patterns
- Model Types - ControlNet model types and compatibility
- Base Models Comparison - ControlNet support across models
- AI Fundamentals - Core neural network concepts
- AI/ML Documentation Hub - Complete AI/ML documentation index