General reductions on host
Benchmark results before these changes (default build): tnl-benchmark-blas.baseline.html
Benchmark results before these changes (build with --optimize-vector-host-operations=true
):
tnl-benchmark-blas.host_unroll_prefetch.html
Benchmark results after the changes (default build): tnl-benchmark-blas.final.html