Parallelize evaluation with rayon

If a single call to your evaluate takes more than ~50 µs, enabling the parallel feature usually pays for itself immediately on population-based algorithms. Each generation evaluates an entire population, and rayon parallelizes that batch.

Enable the feature

[dependencies]
heuropt = { version = "0.10", features = ["parallel"] }

There's nothing else to opt into in your code. The population-evaluation helper is feature-gated; with parallel on it uses rayon::into_par_iter internally, with parallel off it falls back to plain into_iter.

Determinism still holds

Seeded runs are bit-identical between the serial and parallel modes. The trick is that population members are evaluated in parallel but assembled back into the same order. Variation, selection, and the RNG are all driven by the main thread, so seed-stability tests still pass.

Which algorithms benefit

Algorithms with a per-generation evaluate_batch:

Steady-state algorithms (PAES, Simulated Annealing, Hill Climber, (1+1)-ES) only evaluate one or a few candidates per iteration, so the parallel feature gives them nothing — leave it off if those are your primary optimizers.

Worked example

The Sphere problem is too cheap to actually benefit from parallelism — this example just shows the shape. In real workloads evaluate is the expensive bit (a simulation, a model fit, an HTTP call).

use heuropt::prelude::*;

struct ExpensiveSphere;
impl Problem for ExpensiveSphere {
    type Decision = Vec<f64>;
    fn objectives(&self) -> ObjectiveSpace {
        ObjectiveSpace::new(vec![Objective::minimize("f")])
    }
    fn evaluate(&self, x: &Vec<f64>) -> Evaluation {
        // Pretend this is a 5 ms simulation.
        std::thread::sleep(std::time::Duration::from_millis(5));
        Evaluation::new(vec![x.iter().map(|v| v * v).sum::<f64>()])
    }
}

fn main() {
    let bounds = vec![(-1.0_f64, 1.0_f64); 5];
    let mut opt = DifferentialEvolution::new(
        DifferentialEvolutionConfig {
            population_size: 16,
            generations: 50,
            differential_weight: 0.5,
            crossover_probability: 0.9,
            seed: 42,
        },
        RealBounds::new(bounds),
    );
    let r = opt.run(&ExpensiveSphere);
    println!("best f = {}", r.best.unwrap().evaluation.objectives[0]);
}

With the parallel feature on, each generation's 16 evaluations run across rayon's worker threads. On a 16-core machine the wall-clock cost per generation drops from 16 × 5 ms = 80 ms to roughly 5 ms + scheduling overhead.

Sizing your thread pool

heuropt uses rayon's global thread pool. Override the size with:

rayon::ThreadPoolBuilder::new().num_threads(8).build_global().unwrap();

Run this before any heuropt call, or use rayon's install API to scope it.

When parallelism doesn't help

  • Your evaluate is sub-microsecond (Sphere, Rastrigin, Ackley unweighted) — the rayon scheduling overhead exceeds the work.
  • You're already running multiple seeds in parallel at the harness level (see Compare two algorithms). Stacking parallelism rarely helps.
  • The algorithm is steady-state (PAES, SA, hill climber).

parallel vs async

If your evaluate is…Use
CPU-bound (math, simulation)parallel feature (this recipe)
IO-bound (HTTP, RPC, subprocess)async feature → see Async evaluation

Both can be on at once if your evaluation does both substantial CPU work and IO. The two features are independent.