MLIR code generation for ROCm (AMD GPUs)
This MR mainly changes the build process of physics/limpet
to allow for MLIR code generation on AMD GPUs in addition to CUDA NVIDIA GPUs.
Very minor changes had to be made to the original GPU code generation (mainly, stack allocations are always performed at the start of the GPU kernels). Some build file and functions names have been changed because they were CUDA-specific.
Changes have also been made to the various places where GPU host code is written (calls to the CUDA library) to call the HIP library when relevant.
GPU models link against a custom opencarp_(rocm|cuda)_runtime
library, which is a modified version of the default MLIR runtimes with some performance optimizations for single-GPU executions (module and stream caching).