Choosing an algorithm

The README has a compact decision tree. This chapter expands it with the reasoning behind each branch.

Step 0: How expensive is one evaluation?

This is the first fork because it changes everything that comes after it.

Eval cost	Budget you can afford	Algorithm family
Microseconds (pure math)	10 000 – 1 000 000 evals	Population-based
Milliseconds (sim, IO)	1 000 – 10 000 evals	Population-based
Seconds (small training)	100 – 1 000 evals	Sample-efficient (BO, TPE)
Minutes+ (full training)	50 – 500 evals	Sample-efficient + multi-fidelity

For the cheap-eval branch, you have the run of the catalog. For the expensive branch, classical evolutionary methods waste your evaluation budget — go to Bayesian Optimization or TPE. For the very expensive branch where each eval has a tunable budget (epochs, MC samples, sim steps), Hyperband over the PartialProblem trait is the move.

Step 1: How many objectives?

The biggest fork.

One — there's a single best answer. Pick from the single-objective branch.
Two or three — a Pareto front. Pick from the multi-objective branch.
Four or more — a many-objective Pareto front; classical multi-objective methods break down here because almost every pair of points is non-dominated. Pick from the many-objective branch.

Pareto front: the set of decisions where you cannot improve any objective without sacrificing another. In a 2-objective minimize problem, plot every solution; the Pareto front is the lower-left envelope.

If you found yourself staring at a single composite score that's a weighted sum of conflicting goals, you probably have a multi-objective problem in disguise. A weighted sum bakes in your preferences before you've seen the trade-off; running a multi-objective optimizer first and picking off the front later is almost always a better workflow (see Pick one answer off a Pareto front).

Step 2 — single-objective continuous

These all take Vec<f64> decisions.

Smooth, low-to-moderate dimension

CMA-ES is the strong default. It adapts the search distribution's covariance to the local landscape. On the comparison harness it hits machine epsilon on Rosenbrock at 30 000 evaluations.

For very low-dimensional smooth problems (≤ 5 dim), Nelder-Mead is deterministic and converges to f = 0 exactly on Rosenbrock.

High dimension, smooth

sNES uses a diagonal covariance — cheaper per step than CMA-ES at the cost of being unable to model rotated landscapes. Worth trying when CMA-ES's O(d²) per-step cost hurts.

Multimodal landscapes

Multimodal = many local minima that aren't the global one. Rastrigin and Ackley are classic traps.

IPOP-CMA-ES is CMA-ES with an increasing-population restart strategy specifically designed for this. On the harness it drops vanilla CMA-ES's Rastrigin score from f = 2.35 to f = 0.13.

Differential Evolution is rarely beaten on cheap multimodal continuous problems. On Rastrigin it ties with (1+1)-ES at f = 0.

Simulated Annealing is a cheap, generic baseline that escapes local optima via temperature decay.

Want parameter-free

TLBO (Teaching-Learning-Based Optimization) has no F, CR, w, or σ to tune. Often a respectable middle-of-the-pack performer.

Smallest possible self-adapting baseline

(1+1)-ES — Rechenberg's 1973 (1+1)-ES with the one-fifth success rule. On the harness it hits f = 0 on Rastrigin in 50 000 evaluations.

Just want a baseline

Random Search. Useful as a sanity check: if your fancy optimizer can't beat random search, something is wrong (with the fancy optimizer or with the problem).

Step 2 — single-objective other types

Decision type	Algorithm	Notes
`Vec<bool>`	UMDA	Per-bit marginal EDA. Independent-bit assumption.
`Vec<bool>`	GA + `BitFlipMutation`	When bit interactions matter.
`Vec<usize>` (permutation)	Ant Colony	TSP-style with a distance matrix.
`Vec<usize>` (permutation)	GA + `ShuffledPermutation` + `OrderCrossover` + `InversionMutation`	Generic permutation GA; use `EdgeRecombinationCrossover` for TSP-shaped instances.
`Vec<usize>` (JSS multiset)	Simulated Annealing / Tabu Search with `InsertionMutation`, or GA + `ShuffledMultisetPermutation` + local POX	Operation-string encoding. On the FT06 harness the local-search pair edges out the GA — see Optimize a permutation.
`Vec<usize>` (permutation)	Simulated Annealing + `InversionMutation`	Strong on sequencing, not just a baseline — wins the harness's FT06 job-shop table and ties for the TSP optimum.
`Vec<usize>` or custom	Tabu Search	You supply the neighbor function; consistently near the top on the TSP and JSS tables.
Custom struct	Simulated Annealing / Hill Climber	With your own `Variation` impl.

heuropt's permutation operator toolkit covers four crossovers (OrderCrossover, PartiallyMappedCrossover, CycleCrossover, EdgeRecombinationCrossover) and four mutations (SwapMutation, InversionMutation, InsertionMutation, ScrambleMutation), plus two initializers for strict and multiset permutations. See Optimize a permutation for the full picker.

Step 2 — multi-objective (2 or 3)

Strong default

MOEA/D is the most consistent performer on the harness. It decomposes the problem into many scalar sub-problems (Tchebycheff or weighted sum) and solves them in parallel — fast per generation, and robust: it finishes top-3 on every multi- and many-objective table (convex, disconnected, spherical and linear fronts; 2 through 10 objectives) and is consistently the fastest or near-fastest. It rarely wins a table outright — a specialist usually does — but it never lands badly. One caveat from the literature: MOEA/D's spread depends on the weight-vector distribution and the scalarizing function, so it can leave gaps on highly irregular or degenerate fronts; the DTLZ/ZDT suite here doesn't stress that.

NSGA-II is the other safe default — the canonical Pareto-based EA: fast, well-understood, diversity-preserving via crowding distance. On the harness it's edged out by MOEA/D on every multi-objective table and degrades past ~4 objectives (see the many-objective section), but it stays a solid 2–3-objective pick and is the established choice for combinatorial encodings: drop in ShuffledPermutation + a permutation crossover and it solves bi-objective TSP; drop in a binary initializer and BitFlipMutation and it solves bi-objective knapsack. See Multi-objective combinatorial problems.

Real-valued, smooth front, want best convergence

MOPSO (multi-objective PSO with archive). On ZDT1 it wins hypervolume outright and converges 100× tighter than the dominance-based methods.

Better front quality than the default

IBEA (indicator-based) is consistently the best of the dominance-based methods on the harness — wins ZDT3 hypervolume and DTLZ2 mean distance by 24×. It uses an additive ε-indicator for selection rather than dominance + crowding.

SPEA2 (strength + density) — solid alternative; explicit external archive separate from the population.

SMS-EMOA uses exact hypervolume contribution for selection. Elegant in theory; in practice on the harness budgets here it underperforms NSGA-II. Worth the higher per-step cost only when exact HV contribution is the right discriminator.

Disconnected or non-convex front

A disconnected front (separate arcs, like ZDT3) and a non-convex but contiguous front are different problems — don't conflate them.

For a disconnected front, IBEA is the clear pick: on the harness it wins ZDT3 — the disconnected-front benchmark — outright on hypervolume, with MOEA/D and NSGA-II close behind. Counter-intuitively the geometry-aware methods below trail here: estimating a single front geometry or chasing knee points doesn't help when the front is in pieces (on ZDT3, AGE-MOEA and KnEA finish last).

For a non-convex but contiguous front:

AGE-MOEA estimates the front geometry adaptively (the L_p parameter p is fit from data each generation).

KnEA favors knee points — the regions of the front where small gains in one objective cost large losses in another.

Region-based diversity

PESA-II uses grid hyperboxes to drive selection — divide the objective space into a grid, pick from the least-crowded boxes.

ε-MOEA uses an ε-grid archive that auto-limits its size.

Just one starting decision (no population budget)

PAES — (1+1)-ES with a Pareto archive. Cheap, simple, useful when your evaluations are expensive enough that you can't afford a population.

Step 2 — many-objective (4+)

Strong default

MOEA/D again. Decomposition sidesteps the dominance resistance that breaks Pareto-based methods at high objective count — each scalar sub-problem still has a clear best, even when almost every pair of solutions is mutually non-dominated. On the harness it is #2 on every many-objective table (DTLZ2 at 4 and 10 objectives, DTLZ1 at 8), and fast every time. NSGA-II is the cautionary tale: on DTLZ2 at 10 objectives it finishes last — behind random search — because its crowding distance has no dominance signal left to refine.

Linear / simplex-shaped front (e.g., DTLZ1)

GrEA — grid coords drive ranking. On 3-objective DTLZ1 it beats NSGA-III by 3× and AGE-MOEA by 2.5×, and it wins the 8-objective DTLZ1 table outright.

MOEA/D — also #2 on both DTLZ1 tables.

Curved / unknown front geometry

NSGA-III — reference-point niching; the canonical many-objective method by reputation, though on the harness MOEA/D outperforms it on every table. Reach for it when you specifically want reference-point niching.

AGE-MOEA — estimates L_p geometry per generation.

RVEA — reference vectors with adaptive penalty.

Indicator-based selection

IBEA — additive ε-indicator; doesn't degrade at high obj count.

HypE — Monte Carlo hypervolume estimation; scales to arbitrary objective count where exact HV is too expensive.

Step 3: Are there hard constraints?

heuropt models constraints as a single scalar constraint_violation on each Evaluation. Three escalations when the feasibility region is hard to find:

Penalty-only. Just set constraint_violation > 0 for infeasible decisions. The default tournament/Pareto comparisons prefer feasibles automatically.
Repair. Implement Repair<D> (or use the provided ClampToBounds / ProjectToSimplex) to project infeasible decisions back into the feasible region. Pair with a Variation in a CompositeVariation for bounds-aware variants.
Stochastic ranking. Use stochastic_ranking_select instead of tournament_select_single_objective. It probabilistically explores near-feasibility instead of strict feasibility-first ordering, which helps when feasible regions are narrow.

See Constrain your search with Repair for worked examples.

Step 4: Should you parallelize?

Enable the parallel feature flag if your evaluate takes more than ~50 µs. Population-based algorithms (Random Search, NSGA-II, Differential Evolution, SPEA2, IBEA, MOPSO, …) batch- evaluate via rayon when the feature is on. Seeded runs stay bit-identical to serial mode.

heuropt = { version = "0.10", features = ["parallel"] }

If your evaluation is IO-bound (HTTP request, RPC, subprocess) rather than CPU-bound, use the async feature instead — it gives you AsyncProblem and a run_async(&problem, concurrency).await method on every algorithm in the catalog. See the Async evaluation cookbook recipe.

TL;DR table

Situation	Pick
Smooth single-objective continuous	CMA-ES
Multimodal single-objective continuous	IPOP-CMA-ES or Differential Evolution
Expensive single-objective	Bayesian Optimization or TPE
Multi-fidelity single-objective	Hyperband
2- or 3-objective default	MOEA/D (or NSGA-II)
Many-objective default	MOEA/D
2-objective real-valued smooth front	MOPSO
Disconnected front	IBEA
Many-objective, curved front	NSGA-III
Many-objective, linear / simplex front	GrEA
Permutation problem (TSP with distance matrix)	Ant Colony
Generic permutation problem	GA + permutation toolkit
Bi-objective combinatorial (TSP / scheduling / knapsack)	NSGA-II + matching encoding operators
3-objective combinatorial	NSGA-III + matching encoding operators
Binary problem	UMDA
Custom decision type	Simulated Annealing + your `Variation`
Sanity baseline	Random Search

heuropt — the user guide