Back to Showcase

Lunar Lander QD Optimization

Explore a CVT Archive of diverse landing strategies discovered through Quality-Diversity optimization. Each point represents an elite solution with unique behavioral characteristics.

Epoch:1000

Drag to rotate, scroll to zoom, click cells to inspect

Fitness:
-300
+300

Archive Stats

Epoch1000
Elites434
Coverage86.8%
Max Fitness263.9
Mean Fitness-0.6

Selected Elite

Click on a cell in the archive to view its details

Behavior Space

Y-Velocity[-3.0, 0.0]
X-Position[-1.0, 1.0]
X-Velocity[-1.0, 1.0]
View Source Code

Sample Elite Landings

Different elites from the archive exhibit diverse landing behaviors while all achieving successful landings.

Elite landing demo 1

Environment Seed: 321

Stable descent with minimal horizontal drift

Elite landing demo 2

Environment Seed: 87

Aggressive correction under different wind conditions

The Problem

Traditional reinforcement learning finds a single optimal policy. But what if we want to discover many different ways to land successfully? Quality-Diversity (QD) optimization finds a diverse collection of high-performing solutions.

In the LunarLander environment, the agent must land safely on a platform. Different landing behaviors (fast vs slow impact, left vs right position) can all be successful. QD discovers the full spectrum of viable strategies.

Quality-Diversity Optimization

CVT Archive

A Centroidal Voronoi Tessellation partitions the behavior space into cells. Each cell stores the best solution found for that behavioral niche.

Evolution Strategies

Multiple emitters generate candidate solutions by mutating existing elites. Solutions compete to fill cells based on both quality and diversity.

Behavior Descriptors

Three measures capture landing behavior: vertical impact velocity, horizontal position, and horizontal velocity at touchdown.

QD Score

Success is measured by both coverage (how much of the behavior space is filled) and quality (the sum of all elite fitnesses).

Archive Insights

Exploring the archive reveals intuitive patterns about what makes a successful landing:

Soft Landings Score Higher

Elites with lower vertical impact velocity (closer to 0) consistently achieve higher fitness. This makes physical sense—gentler touchdowns avoid crash penalties and earn landing bonuses. The gradient from purple to yellow across the Y-velocity axis visualizes this directly.

Centered Landings Are Rewarded

Solutions landing near the center of the pad (X-position ≈ 0) tend to have higher fitness. The environment rewards precision—landing too far left or right wastes fuel on correction maneuvers and risks missing the target entirely.

Horizontal Stability Matters

Low horizontal velocity at impact correlates with higher rewards. A lander drifting sideways on touchdown is unstable and may tip over. The best elites approach the pad with minimal lateral movement.

Diversity Has Value

Despite the optimal behavior being a slow, centered landing, QD finds successful policies across the entire behavior space. These "suboptimal but viable" solutions are valuable—they demonstrate robustness and could be used when environmental conditions change.

Project Configuration

ParameterValue
Archive Cells500
Epochs1000
Emitters5
Batch Size30
EnvironmentLunarLander-v3 (wind enabled)