On RISC-V, the cycle CSR holds a 64-bit count of the number of clock
cycles executed by the core, from an arbitrary point in the past. This
matches the intended semantics of @llvm.readcyclecounter(), which we
currently leave to the default lowering (to the constant 0).
With this patch, we will now correctly lower this intrinsic to the
intended semantics, using the user-space instruction rdcycle. On
64-bit targets, we can directly lower to this instruction.
On 32-bit targets, we need to do more, as rdcycle only returns the low
32-bits of the cycle CSR. In this case, we perform a custom lowering,
based on the PowerPC lowering, using rdcycleh to obtain the high
32-bits of the cycle CSR. This custom lowering inserts a new basic
block which detects overflow in the high 32-bits of the cycle CSR
during reading (because multiple instructions are required to read). The
emitted assembly matches the suggested assembly in the RISC-V
specification.