Diffusion Policy

3 minute read

Published: August 13, 2025

Article Goal

Explain Diffusion Policy algorithm for visuomotor policy learning in robotics

What is Diffusion Policy?

Diffusion Policy is an approach to robot behavior generation that represents policies as conditional denoising diffusion processes. Instead of directly predicting actions from observations, Diffusion Policy learns to gradually denoise random noise into coherent action sequences, conditioned on visual observations and robot states.

The key insight is that action generation can be framed as a generative modeling problem, where the policy learns the distribution of expert actions and samples from this learned distribution during execution. This approach naturally handles multimodal behaviors and avoids the mode collapse issues that plague traditional behavior cloning methods.

Diffusion Policy: Written Walkthrough

Core Concepts

Action Diffusion - Models policy as a conditional diffusion process that generates action sequences by iteratively denoising noise.
Multimodal Behavior Learning - Captures multiple valid ways to perform a task without averaging them out.
Receding Horizon Control - Predicts sequences of future actions but only executes the first few, then re-plans.
Visual Conditioning - Conditions the diffusion process on camera observations and proprioceptive state.

The Math

Forward Diffusion Process

The forward process gradually adds noise to action sequences: \(q(A_k | A_{k-1}) = \mathcal{N}(A_k; \sqrt{1-\beta_k}A_{k-1}, \beta_k I)\)

Where:

\(A_0\) is the original action sequence
\(A_k\) is the noisy version at diffusion step \(k\)
\(\beta_k\) is the noise schedule

Reverse Diffusion Process

The reverse process learns to denoise: \(p_\theta(A_{k-1} | A_k, O) = \mathcal{N}(A_{k-1}; \mu_\theta(A_k, O, k), \Sigma_\theta(A_k, O, k))\)

Where:

\(O\) represents observations (visual + proprioceptive)
\(\theta\) are the neural network parameters

Noise Prediction

The neural network learns to predict the noise that was added: \(\epsilon_\theta(A_k, O, k) \approx \epsilon\)

Where \(\epsilon\) is the noise that was added to create \(A_k\) from \(A_0\).

Training Loss

The training objective is: \(\mathcal{L} = \mathbb{E}_{A_0, \epsilon, k, O} \left[ \|\epsilon - \epsilon_\theta(A_k, O, k)\|^2 \right]\)

Sampling Process

During inference, actions are generated by:

Start with random noise: \(A_K \sim \mathcal{N}(0, I)\)
Iteratively denoise: \(A_{k-1} = \frac{1}{\sqrt{\alpha_k}}(A_k - \frac{\beta_k}{\sqrt{1-\bar{\alpha}_k}}\epsilon_\theta(A_k, O, k)) + \sigma_k z\)
Output final action sequence: \(A_0\)

Where \(\alpha_k = 1 - \beta_k\), \(\bar{\alpha}_k = \prod_{i=1}^k \alpha_i\), and \(z \sim \mathcal{N}(0, I)\).

Action Chunking

The policy predicts action sequences of length \(T_a\): \(A = [a_t, a_{t+1}, ..., a_{t+T_a-1}]\)

But only executes the first \(T_e\) actions before replanning.

Algorithm Steps

Data Collection - Gather expert demonstrations with observations and action sequences
Noise Schedule Setup - Define diffusion timesteps and noise schedule \(\beta_1, ..., \beta_K\)
Neural Network Training - Train noise prediction network \(\epsilon_\theta\) on demonstration data
Inference Initialization - Start with random noise \(A_K \sim \mathcal{N}(0, I)\)
Denoising Loop - For \(k = K, K-1, ..., 1\): predict noise and update \(A_{k-1}\)
Action Execution - Execute first \(T_e\) actions from denoised sequence \(A_0\)
Replanning - Observe new state and repeat sampling process
Repeat - Continue until task completion

Advantages of Diffusion Policy

Multimodal Behavior - Naturally learns and executes diverse behavioral modes without mode collapse
High-Quality Trajectories - Generates smooth, high-quality action sequences
Visual Robustness - Handles complex visual observations effectively
Stable Training - More stable than GANs and avoids common training issues
Expressiveness - Can represent complex, multimodal action distributions
Strong Empirical Performance - Achieves 46.9% average improvement over state-of-the-art methods

Limitations

Computational Cost - Requires multiple denoising steps during inference, slower than direct prediction
Hyperparameter Sensitivity - Performance depends on noise schedule, diffusion steps, and architecture choices
Training Time - Requires more training time than simple behavioral cloning approaches
Memory Requirements - Needs to store and process action sequences rather than single actions
Limited Real-Time Applications - Inference speed may limit real-time applications requiring high-frequency control
Architecture Complexity - More complex to implement than standard imitation learning approaches

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

Neal Ramaswamy

Diffusion Policy

Article Goal

What is Diffusion Policy?

Core Concepts

The Math

Forward Diffusion Process

Reverse Diffusion Process

Noise Prediction

Training Loss

Sampling Process

Action Chunking

Algorithm Steps

Advantages of Diffusion Policy

Limitations

Share on

You May Also Enjoy

Robot Arm 2D Forward / Inverse Kinematics

Model Predictive Control in Basic Autonomous Vehicle

Kalman Filter

Article Goal

Proportional-Integral-Derivative (PID) Control

Article Goal