Choosing an algorithm
The README has a compact decision tree. This chapter expands it with the reasoning behind each branch.
Step 0: How expensive is one evaluation?
This is the first fork because it changes everything that comes after it.
| Eval cost | Budget you can afford | Algorithm family |
|---|---|---|
| Microseconds (pure math) | 10 000 – 1 000 000 evals | Population-based |
| Milliseconds (sim, IO) | 1 000 – 10 000 evals | Population-based |
| Seconds (small training) | 100 – 1 000 evals | Sample-efficient (BO, TPE) |
| Minutes+ (full training) | 50 – 500 evals | Sample-efficient + multi-fidelity |
For the cheap-eval branch, you have the run of the catalog. For the
expensive branch, classical evolutionary methods waste your evaluation
budget — go to Bayesian Optimization or TPE. For the very expensive
branch where each eval has a tunable budget (epochs, MC samples, sim
steps), Hyperband over the PartialProblem trait is the move.
Step 1: How many objectives?
The biggest fork.
- One — there's a single best answer. Pick from the single-objective branch.
- Two or three — a Pareto front. Pick from the multi-objective branch.
- Four or more — a many-objective Pareto front; classical multi-objective methods break down here because almost every pair of points is non-dominated. Pick from the many-objective branch.
Pareto front: the set of decisions where you cannot improve any objective without sacrificing another. In a 2-objective minimize problem, plot every solution; the Pareto front is the lower-left envelope.
If you found yourself staring at a single composite score that's a weighted sum of conflicting goals, you probably have a multi-objective problem in disguise. A weighted sum bakes in your preferences before you've seen the trade-off; running a multi-objective optimizer first and picking off the front later is almost always a better workflow (see Pick one answer off a Pareto front).
Step 2 — single-objective continuous
These all take Vec<f64> decisions.
Smooth, low-to-moderate dimension
CMA-ES is the strong default. It adapts the search distribution's covariance to the local landscape. On the comparison harness it hits machine epsilon on Rosenbrock at 30 000 evaluations.
For very low-dimensional smooth problems (≤ 5 dim), Nelder-Mead is deterministic and converges to f = 0 exactly on Rosenbrock.
High dimension, smooth
sNES uses a diagonal covariance — cheaper per step than
CMA-ES at the cost of being unable to model rotated landscapes. Worth
trying when CMA-ES's O(d²) per-step cost hurts.
Multimodal landscapes
Multimodal = many local minima that aren't the global one. Rastrigin and Ackley are classic traps.
IPOP-CMA-ES is CMA-ES with an increasing-population restart strategy specifically designed for this. On the harness it drops vanilla CMA-ES's Rastrigin score from f = 2.35 to f = 0.13.
Differential Evolution is rarely beaten on cheap multimodal
continuous problems. On Rastrigin it ties with (1+1)-ES at f = 0.
Simulated Annealing is a cheap, generic baseline that escapes local optima via temperature decay.
Want parameter-free
TLBO (Teaching-Learning-Based Optimization) has no F, CR, w,
or σ to tune. Often a respectable middle-of-the-pack performer.
Smallest possible self-adapting baseline
(1+1)-ES — Rechenberg's 1973 (1+1)-ES with the one-fifth
success rule. On the harness it hits f = 0 on Rastrigin in 50 000
evaluations.
Just want a baseline
Random Search. Useful as a sanity check: if your fancy optimizer can't beat random search, something is wrong (with the fancy optimizer or with the problem).
Step 2 — single-objective other types
| Decision type | Algorithm | Notes |
|---|---|---|
Vec<bool> | UMDA | Per-bit marginal EDA. Independent-bit assumption. |
Vec<bool> | GA + BitFlipMutation | When bit interactions matter. |
Vec<usize> (permutation) | Ant Colony | TSP-style with a distance matrix. |
Vec<usize> (permutation) | GA + ShuffledPermutation + OrderCrossover + InversionMutation | Generic permutation GA; use EdgeRecombinationCrossover for TSP-shaped instances. |
Vec<usize> (JSS multiset) | Simulated Annealing / Tabu Search with InsertionMutation, or GA + ShuffledMultisetPermutation + local POX | Operation-string encoding. On the FT06 harness the local-search pair edges out the GA — see Optimize a permutation. |
Vec<usize> (permutation) | Simulated Annealing + InversionMutation | Strong on sequencing, not just a baseline — wins the harness's FT06 job-shop table and ties for the TSP optimum. |
Vec<usize> or custom | Tabu Search | You supply the neighbor function; consistently near the top on the TSP and JSS tables. |
| Custom struct | Simulated Annealing / Hill Climber | With your own Variation impl. |
heuropt's permutation operator toolkit covers four crossovers
(OrderCrossover, PartiallyMappedCrossover, CycleCrossover,
EdgeRecombinationCrossover) and four mutations (SwapMutation,
InversionMutation, InsertionMutation, ScrambleMutation),
plus two initializers for strict and multiset permutations. See
Optimize a permutation for the full
picker.
Step 2 — multi-objective (2 or 3)
Strong default
MOEA/D is the most consistent performer on the harness. It decomposes the problem into many scalar sub-problems (Tchebycheff or weighted sum) and solves them in parallel — fast per generation, and robust: it finishes top-3 on every multi- and many-objective table (convex, disconnected, spherical and linear fronts; 2 through 10 objectives) and is consistently the fastest or near-fastest. It rarely wins a table outright — a specialist usually does — but it never lands badly. One caveat from the literature: MOEA/D's spread depends on the weight-vector distribution and the scalarizing function, so it can leave gaps on highly irregular or degenerate fronts; the DTLZ/ZDT suite here doesn't stress that.
NSGA-II is the other safe default — the canonical Pareto-based
EA: fast, well-understood, diversity-preserving via crowding distance.
On the harness it's edged out by MOEA/D on every multi-objective table
and degrades past ~4 objectives (see the many-objective section), but it
stays a solid 2–3-objective pick and is the established choice for
combinatorial encodings: drop in ShuffledPermutation + a
permutation crossover and it solves bi-objective TSP; drop in a binary
initializer and BitFlipMutation and it solves bi-objective knapsack.
See Multi-objective combinatorial problems.
Real-valued, smooth front, want best convergence
MOPSO (multi-objective PSO with archive). On ZDT1 it wins hypervolume outright and converges 100× tighter than the dominance-based methods.
Better front quality than the default
IBEA (indicator-based) is consistently the best of the dominance-based methods on the harness — wins ZDT3 hypervolume and DTLZ2 mean distance by 24×. It uses an additive ε-indicator for selection rather than dominance + crowding.
SPEA2 (strength + density) — solid alternative; explicit external archive separate from the population.
SMS-EMOA uses exact hypervolume contribution for selection. Elegant in theory; in practice on the harness budgets here it underperforms NSGA-II. Worth the higher per-step cost only when exact HV contribution is the right discriminator.
Disconnected or non-convex front
A disconnected front (separate arcs, like ZDT3) and a non-convex but contiguous front are different problems — don't conflate them.
For a disconnected front, IBEA is the clear pick: on the harness it wins ZDT3 — the disconnected-front benchmark — outright on hypervolume, with MOEA/D and NSGA-II close behind. Counter-intuitively the geometry-aware methods below trail here: estimating a single front geometry or chasing knee points doesn't help when the front is in pieces (on ZDT3, AGE-MOEA and KnEA finish last).
For a non-convex but contiguous front:
AGE-MOEA estimates the front geometry adaptively (the L_p
parameter p is fit from data each generation).
KnEA favors knee points — the regions of the front where small gains in one objective cost large losses in another.
Region-based diversity
PESA-II uses grid hyperboxes to drive selection — divide the objective space into a grid, pick from the least-crowded boxes.
ε-MOEA uses an ε-grid archive that auto-limits its size.
Just one starting decision (no population budget)
PAES — (1+1)-ES with a Pareto archive. Cheap, simple, useful
when your evaluations are expensive enough that you can't afford a
population.
Step 2 — many-objective (4+)
Strong default
MOEA/D again. Decomposition sidesteps the dominance resistance that breaks Pareto-based methods at high objective count — each scalar sub-problem still has a clear best, even when almost every pair of solutions is mutually non-dominated. On the harness it is #2 on every many-objective table (DTLZ2 at 4 and 10 objectives, DTLZ1 at 8), and fast every time. NSGA-II is the cautionary tale: on DTLZ2 at 10 objectives it finishes last — behind random search — because its crowding distance has no dominance signal left to refine.
Linear / simplex-shaped front (e.g., DTLZ1)
GrEA — grid coords drive ranking. On 3-objective DTLZ1 it beats NSGA-III by 3× and AGE-MOEA by 2.5×, and it wins the 8-objective DTLZ1 table outright.
MOEA/D — also #2 on both DTLZ1 tables.
Curved / unknown front geometry
NSGA-III — reference-point niching; the canonical many-objective method by reputation, though on the harness MOEA/D outperforms it on every table. Reach for it when you specifically want reference-point niching.
AGE-MOEA — estimates L_p geometry per generation.
RVEA — reference vectors with adaptive penalty.
Indicator-based selection
IBEA — additive ε-indicator; doesn't degrade at high obj count.
HypE — Monte Carlo hypervolume estimation; scales to arbitrary objective count where exact HV is too expensive.
Step 3: Are there hard constraints?
heuropt models constraints as a single scalar constraint_violation
on each Evaluation. Three escalations when the feasibility region
is hard to find:
- Penalty-only. Just set
constraint_violation > 0for infeasible decisions. The default tournament/Pareto comparisons prefer feasibles automatically. - Repair. Implement
Repair<D>(or use the providedClampToBounds/ProjectToSimplex) to project infeasible decisions back into the feasible region. Pair with aVariationin aCompositeVariationfor bounds-aware variants. - Stochastic ranking. Use
stochastic_ranking_selectinstead oftournament_select_single_objective. It probabilistically explores near-feasibility instead of strict feasibility-first ordering, which helps when feasible regions are narrow.
See Constrain your search with Repair
for worked examples.
Step 4: Should you parallelize?
Enable the parallel feature flag if your evaluate takes more
than ~50 µs. Population-based algorithms (Random Search, NSGA-II,
Differential Evolution, SPEA2, IBEA, MOPSO, …) batch-
evaluate via rayon when the feature is on. Seeded runs stay
bit-identical to serial mode.
heuropt = { version = "0.10", features = ["parallel"] }
If your evaluation is IO-bound (HTTP request, RPC, subprocess)
rather than CPU-bound, use the async feature instead — it gives
you AsyncProblem and a run_async(&problem, concurrency).await
method on every algorithm in the catalog. See the
Async evaluation cookbook recipe.
TL;DR table
| Situation | Pick |
|---|---|
| Smooth single-objective continuous | CMA-ES |
| Multimodal single-objective continuous | IPOP-CMA-ES or Differential Evolution |
| Expensive single-objective | Bayesian Optimization or TPE |
| Multi-fidelity single-objective | Hyperband |
| 2- or 3-objective default | MOEA/D (or NSGA-II) |
| Many-objective default | MOEA/D |
| 2-objective real-valued smooth front | MOPSO |
| Disconnected front | IBEA |
| Many-objective, curved front | NSGA-III |
| Many-objective, linear / simplex front | GrEA |
| Permutation problem (TSP with distance matrix) | Ant Colony |
| Generic permutation problem | GA + permutation toolkit |
| Bi-objective combinatorial (TSP / scheduling / knapsack) | NSGA-II + matching encoding operators |
| 3-objective combinatorial | NSGA-III + matching encoding operators |
| Binary problem | UMDA |
| Custom decision type | Simulated Annealing + your Variation |
| Sanity baseline | Random Search |