Single precision floating point numbers for LIMPET

I have been investigating using single precision (f32) numbers for LIMPET computations. This could offer significant speedups to the cost of reduced numerical precision.

I tried two things:

Store only lookup tables (LUTs) as f32, meaning only LUT interpolation is done with f32, everything else is f64. Speedup in blue on the bar plot compared to baseline (everything on f64). Potential
Store LUTs as f32 and state variables as f32 as well. This means almost all operations can be done using f32 instructions. Speedup in red on the bar plot.

A working version of openCARP with f32 limpet enabled by default is available on the branch lut-f32-inmemory. Needs to be built with MLIR.

I attached to this post a PDF with plots of the potential over time for each model, comparing the output for baseline (all f64), only LUTs f32 and both LUTs and state variables f32.

The speedup is very interesting for some models and the error seems acceptable for most models. There are a few ways this could be integrated to opencarp: a runtime switch for the user (meaning we would have to compile all versions f32/f64, making the build process heavier) or enable f32 only for some models (maybe with a switch in easyML). I think it depends mostly on how acceptable the error is. Maybe @axel.loewe @edward.vigmond you have an idea what direction we could go?

These are the speedups I get using bench with 1000 cells for 1000 ms with a time step of 0.01. The benchmarks were ran on a single core with AVX512 (16 floats per vector operation).

bench --bin --validate --no-trace -I <Model> -n 1000 -a 1000

limpet.pdf