Model Predictive Path Integral (MPPI)

3 minute read

Published: August 13, 2025

Article Goal

Explain MPPI (Model Predictive Path Integral) algorithm for stochastic optimal control

What is MPPI?

MPPI is a sampling-based model predictive control algorithm that combines path integral formulations from statistical physics with model predictive control. MPPI performs stochastic optimization by sampling thousands of control trajectories, evaluating their costs using forward dynamics models, and then computing control inputs as importance-weighted averages of the samples.

Unlike traditional gradient-based MPC methods, MPPI is derivative-free and can handle non-convex cost functions, making it particularly suitable for complex robotic systems with constraints and obstacles. The algorithm leverages parallel computing (especially GPUs) to evaluate many rollouts simultaneously, enabling real-time performance.

MPPI: Written Walkthrough

Core Concepts

Sampling-Based Optimization - Generates thousands of random control trajectories rather than using gradient-based optimization.
Information-Theoretic Control - Uses principles from statistical physics, specifically the relationship between free energy and relative entropy.
Importance Sampling - Weights trajectories based on their cost, giving higher importance to lower-cost rollouts.
Parallel Computation - Leverages GPU parallelization to evaluate many rollouts simultaneously for real-time performance.

The Math

System Dynamics

Consider a discrete-time stochastic system: \(x_{t+1} = f(x_t, u_t) + w_t\)

Where \(w_t\) represents system noise or disturbances.

Cost Function

The total cost for a trajectory is: \(S(\tau) = \phi(x_T) + \sum_{t=0}^{T-1} q(x_t, u_t)\)

Where:

\(\phi(x_T)\) is the terminal cost
\(q(x_t, u_t)\) is the running cost
\(\tau\) represents a complete trajectory

Control Perturbations

MPPI generates control sequences by adding noise to a nominal control: \(u_t^{(i)} = \bar{u}_t + \epsilon_t^{(i)}\)

Where:

\(\bar{u}_t\) is the nominal control at time \(t\)
\(\epsilon_t^{(i)} \sim \mathcal{N}(0, \Sigma)\) is sampled noise for rollout \(i\)

Importance Weights

The weight for each trajectory is computed using: \(w^{(i)} = \frac{\exp(-\frac{1}{\lambda} S(\tau^{(i)}))}{\sum_{j=1}^{K} \exp(-\frac{1}{\lambda} S(\tau^{(j)}))}\)

Where:

\(\lambda\) is the temperature parameter controlling exploration vs. exploitation
\(K\) is the total number of rollouts
Lower costs result in higher weights

Control Update

The optimal control is computed as the weighted average: \(u_t^* = \bar{u}_t + \sum_{i=1}^{K} w^{(i)} \epsilon_t^{(i)}\)

Path Integral Formulation

The theoretical foundation comes from the path integral: \(u_t^* = \bar{u}_t + \frac{\int \epsilon_t \exp(-\frac{1}{\lambda} S(\tau)) d\tau}{\int \exp(-\frac{1}{\lambda} S(\tau)) d\tau}\)

Which MPPI approximates through Monte Carlo sampling.

Algorithm Steps

Initialize - Set nominal control sequence \(\bar{u}_0, \bar{u}_1, ..., \bar{u}_{T-1}\)
Sample Rollouts - Generate \(K\) control sequences by adding Gaussian noise: \(u_t^{(i)} = \bar{u}_t + \epsilon_t^{(i)}\)
Forward Simulation - For each rollout, simulate system dynamics to get trajectory \(\tau^{(i)}\)
Cost Evaluation - Compute cost \(S(\tau^{(i)})\) for each trajectory
Compute Weights - Calculate importance weights \(w^{(i)}\) based on trajectory costs
Update Control - Compute weighted average: \(u_0^* = \bar{u}_0 + \sum_{i=1}^{K} w^{(i)} \epsilon_0^{(i)}\)
Apply Control - Execute \(u_0^*\) and shift time horizon forward
Repeat - Update nominal sequence and repeat for next time step

Advantages of MPPI

Derivative-Free - No need for cost function gradients or dynamics linearization
Handles Non-Convexity - Can optimize non-convex and discontinuous cost functions
Real-Time Capability - GPU parallelization enables real-time performance
Constraint Handling - Easily incorporates hard constraints through cost penalties
Robust to Model Uncertainty - Stochastic sampling provides natural robustness
Implementation Simplicity - Straightforward to implement compared to gradient-based methods

Limitations

Sample Complexity - Requires many rollouts for good performance, computationally expensive
Hyperparameter Sensitivity - Performance depends on proper tuning of λ, Σ, and K
Short-Sighted - Limited by prediction horizon length
Noise Assumptions - Assumes Gaussian noise, may not suit all applications
Memory Requirements - Storing thousands of rollouts can be memory-intensive
Local Optima - Can get stuck in local minima despite stochastic exploration

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

Neal Ramaswamy

Model Predictive Path Integral (MPPI)

Article Goal

What is MPPI?

Core Concepts

The Math

System Dynamics

Cost Function

Control Perturbations

Importance Weights

Control Update

Path Integral Formulation

Algorithm Steps

Advantages of MPPI

Limitations

Share on

You May Also Enjoy

Robot Arm 2D Forward / Inverse Kinematics

Model Predictive Control in Basic Autonomous Vehicle

Kalman Filter

Article Goal

Proportional-Integral-Derivative (PID) Control

Article Goal