Since the scheduling resources for reductions and ordered reductions now
account for LMUL and SEW, we can modify the Latency and ResourceCycles
for these resoruces.
- Most reductions take a total of approx vl*SEW/DLEN + 5*(4 + log2(DLEN/SEW)) cycles.
- Ordered floating-point reductions take a total of approx 5*vl cycles.
I don't think we usually indent after a let if the let doesn't use braces.