Choosing an algorithm

The README has a compact decision tree. This chapter expands it with the reasoning behind each branch.

Step 0: How expensive is one evaluation?

This is the first fork because it changes everything that comes after it.

Eval costBudget you can affordAlgorithm family
Microseconds (pure math)10 000 – 1 000 000 evalsPopulation-based
Milliseconds (sim, IO)1 000 – 10 000 evalsPopulation-based
Seconds (small training)100 – 1 000 evalsSample-efficient (BO, TPE)
Minutes+ (full training)50 – 500 evalsSample-efficient + multi-fidelity

For the cheap-eval branch, you have the run of the catalog. For the expensive branch, classical evolutionary methods waste your evaluation budget — go to Bayesian Optimization or TPE. For the very expensive branch where each eval has a tunable budget (epochs, MC samples, sim steps), Hyperband over the PartialProblem trait is the move.

Step 1: How many objectives?

The biggest fork.

  • One — there's a single best answer. Pick from the single-objective branch.
  • Two or three — a Pareto front. Pick from the multi-objective branch.
  • Four or more — a many-objective Pareto front; classical multi-objective methods break down here because almost every pair of points is non-dominated. Pick from the many-objective branch.

Pareto front: the set of decisions where you cannot improve any objective without sacrificing another. In a 2-objective minimize problem, plot every solution; the Pareto front is the lower-left envelope.

If you found yourself staring at a single composite score that's a weighted sum of conflicting goals, you probably have a multi-objective problem in disguise. A weighted sum bakes in your preferences before you've seen the trade-off; running a multi-objective optimizer first and picking off the front later is almost always a better workflow (see Pick one answer off a Pareto front).

Step 2 — single-objective continuous

These all take Vec<f64> decisions.

Smooth, low-to-moderate dimension

CMA-ES is the strong default. It adapts the search distribution's covariance to the local landscape. On the comparison harness it hits machine epsilon on Rosenbrock at 30 000 evaluations.

For very low-dimensional smooth problems (≤ 5 dim), Nelder-Mead is deterministic and converges to f = 0 exactly on Rosenbrock.

High dimension, smooth

sNES uses a diagonal covariance — cheaper per step than CMA-ES at the cost of being unable to model rotated landscapes. Worth trying when CMA-ES's O(d²) per-step cost hurts.

Multimodal landscapes

Multimodal = many local minima that aren't the global one. Rastrigin and Ackley are classic traps.

IPOP-CMA-ES is CMA-ES with an increasing-population restart strategy specifically designed for this. On the harness it drops vanilla CMA-ES's Rastrigin score from f = 2.35 to f = 0.13.

Differential Evolution is rarely beaten on cheap multimodal continuous problems. On Rastrigin it ties with (1+1)-ES at f = 0.

Simulated Annealing is a cheap, generic baseline that escapes local optima via temperature decay.

Want parameter-free

TLBO (Teaching-Learning-Based Optimization) has no F, CR, w, or σ to tune. Often a respectable middle-of-the-pack performer.

Smallest possible self-adapting baseline

(1+1)-ES — Rechenberg's 1973 (1+1)-ES with the one-fifth success rule. On the harness it hits f = 0 on Rastrigin in 50 000 evaluations.

Just want a baseline

Random Search. Useful as a sanity check: if your fancy optimizer can't beat random search, something is wrong (with the fancy optimizer or with the problem).

Step 2 — single-objective other types

Decision typeAlgorithmNotes
Vec<bool>UMDAPer-bit marginal EDA. Independent-bit assumption.
Vec<bool>GA + BitFlipMutationWhen bit interactions matter.
Vec<usize> (permutation)Ant ColonyTSP-style with a distance matrix.
Vec<usize> (permutation)GA + ShuffledPermutation + OrderCrossover + InversionMutationGeneric permutation GA; use EdgeRecombinationCrossover for TSP-shaped instances.
Vec<usize> (JSS multiset)Simulated Annealing / Tabu Search with InsertionMutation, or GA + ShuffledMultisetPermutation + local POXOperation-string encoding. On the FT06 harness the local-search pair edges out the GA — see Optimize a permutation.
Vec<usize> (permutation)Simulated Annealing + InversionMutationStrong on sequencing, not just a baseline — wins the harness's FT06 job-shop table and ties for the TSP optimum.
Vec<usize> or customTabu SearchYou supply the neighbor function; consistently near the top on the TSP and JSS tables.
Custom structSimulated Annealing / Hill ClimberWith your own Variation impl.

heuropt's permutation operator toolkit covers four crossovers (OrderCrossover, PartiallyMappedCrossover, CycleCrossover, EdgeRecombinationCrossover) and four mutations (SwapMutation, InversionMutation, InsertionMutation, ScrambleMutation), plus two initializers for strict and multiset permutations. See Optimize a permutation for the full picker.

Step 2 — multi-objective (2 or 3)

Strong default

MOEA/D is the most consistent performer on the harness. It decomposes the problem into many scalar sub-problems (Tchebycheff or weighted sum) and solves them in parallel — fast per generation, and robust: it finishes top-3 on every multi- and many-objective table (convex, disconnected, spherical and linear fronts; 2 through 10 objectives) and is consistently the fastest or near-fastest. It rarely wins a table outright — a specialist usually does — but it never lands badly. One caveat from the literature: MOEA/D's spread depends on the weight-vector distribution and the scalarizing function, so it can leave gaps on highly irregular or degenerate fronts; the DTLZ/ZDT suite here doesn't stress that.

NSGA-II is the other safe default — the canonical Pareto-based EA: fast, well-understood, diversity-preserving via crowding distance. On the harness it's edged out by MOEA/D on every multi-objective table and degrades past ~4 objectives (see the many-objective section), but it stays a solid 2–3-objective pick and is the established choice for combinatorial encodings: drop in ShuffledPermutation + a permutation crossover and it solves bi-objective TSP; drop in a binary initializer and BitFlipMutation and it solves bi-objective knapsack. See Multi-objective combinatorial problems.

Real-valued, smooth front, want best convergence

MOPSO (multi-objective PSO with archive). On ZDT1 it wins hypervolume outright and converges 100× tighter than the dominance-based methods.

Better front quality than the default

IBEA (indicator-based) is consistently the best of the dominance-based methods on the harness — wins ZDT3 hypervolume and DTLZ2 mean distance by 24×. It uses an additive ε-indicator for selection rather than dominance + crowding.

SPEA2 (strength + density) — solid alternative; explicit external archive separate from the population.

SMS-EMOA uses exact hypervolume contribution for selection. Elegant in theory; in practice on the harness budgets here it underperforms NSGA-II. Worth the higher per-step cost only when exact HV contribution is the right discriminator.

Disconnected or non-convex front

A disconnected front (separate arcs, like ZDT3) and a non-convex but contiguous front are different problems — don't conflate them.

For a disconnected front, IBEA is the clear pick: on the harness it wins ZDT3 — the disconnected-front benchmark — outright on hypervolume, with MOEA/D and NSGA-II close behind. Counter-intuitively the geometry-aware methods below trail here: estimating a single front geometry or chasing knee points doesn't help when the front is in pieces (on ZDT3, AGE-MOEA and KnEA finish last).

For a non-convex but contiguous front:

AGE-MOEA estimates the front geometry adaptively (the L_p parameter p is fit from data each generation).

KnEA favors knee points — the regions of the front where small gains in one objective cost large losses in another.

Region-based diversity

PESA-II uses grid hyperboxes to drive selection — divide the objective space into a grid, pick from the least-crowded boxes.

ε-MOEA uses an ε-grid archive that auto-limits its size.

Just one starting decision (no population budget)

PAES(1+1)-ES with a Pareto archive. Cheap, simple, useful when your evaluations are expensive enough that you can't afford a population.

Step 2 — many-objective (4+)

Strong default

MOEA/D again. Decomposition sidesteps the dominance resistance that breaks Pareto-based methods at high objective count — each scalar sub-problem still has a clear best, even when almost every pair of solutions is mutually non-dominated. On the harness it is #2 on every many-objective table (DTLZ2 at 4 and 10 objectives, DTLZ1 at 8), and fast every time. NSGA-II is the cautionary tale: on DTLZ2 at 10 objectives it finishes last — behind random search — because its crowding distance has no dominance signal left to refine.

Linear / simplex-shaped front (e.g., DTLZ1)

GrEA — grid coords drive ranking. On 3-objective DTLZ1 it beats NSGA-III by 3× and AGE-MOEA by 2.5×, and it wins the 8-objective DTLZ1 table outright.

MOEA/D — also #2 on both DTLZ1 tables.

Curved / unknown front geometry

NSGA-III — reference-point niching; the canonical many-objective method by reputation, though on the harness MOEA/D outperforms it on every table. Reach for it when you specifically want reference-point niching.

AGE-MOEA — estimates L_p geometry per generation.

RVEA — reference vectors with adaptive penalty.

Indicator-based selection

IBEA — additive ε-indicator; doesn't degrade at high obj count.

HypE — Monte Carlo hypervolume estimation; scales to arbitrary objective count where exact HV is too expensive.

Step 3: Are there hard constraints?

heuropt models constraints as a single scalar constraint_violation on each Evaluation. Three escalations when the feasibility region is hard to find:

  1. Penalty-only. Just set constraint_violation > 0 for infeasible decisions. The default tournament/Pareto comparisons prefer feasibles automatically.
  2. Repair. Implement Repair<D> (or use the provided ClampToBounds / ProjectToSimplex) to project infeasible decisions back into the feasible region. Pair with a Variation in a CompositeVariation for bounds-aware variants.
  3. Stochastic ranking. Use stochastic_ranking_select instead of tournament_select_single_objective. It probabilistically explores near-feasibility instead of strict feasibility-first ordering, which helps when feasible regions are narrow.

See Constrain your search with Repair for worked examples.

Step 4: Should you parallelize?

Enable the parallel feature flag if your evaluate takes more than ~50 µs. Population-based algorithms (Random Search, NSGA-II, Differential Evolution, SPEA2, IBEA, MOPSO, …) batch- evaluate via rayon when the feature is on. Seeded runs stay bit-identical to serial mode.

heuropt = { version = "0.10", features = ["parallel"] }

If your evaluation is IO-bound (HTTP request, RPC, subprocess) rather than CPU-bound, use the async feature instead — it gives you AsyncProblem and a run_async(&problem, concurrency).await method on every algorithm in the catalog. See the Async evaluation cookbook recipe.

TL;DR table

SituationPick
Smooth single-objective continuousCMA-ES
Multimodal single-objective continuousIPOP-CMA-ES or Differential Evolution
Expensive single-objectiveBayesian Optimization or TPE
Multi-fidelity single-objectiveHyperband
2- or 3-objective defaultMOEA/D (or NSGA-II)
Many-objective defaultMOEA/D
2-objective real-valued smooth frontMOPSO
Disconnected frontIBEA
Many-objective, curved frontNSGA-III
Many-objective, linear / simplex frontGrEA
Permutation problem (TSP with distance matrix)Ant Colony
Generic permutation problemGA + permutation toolkit
Bi-objective combinatorial (TSP / scheduling / knapsack)NSGA-II + matching encoding operators
3-objective combinatorialNSGA-III + matching encoding operators
Binary problemUMDA
Custom decision typeSimulated Annealing + your Variation
Sanity baselineRandom Search