Add an HLFIR operation for the SUM transformational intrinsic, according
to the design set out in flang/doc/HighLevelFIR.md.
I decided to make hlfir.sum lenient about the form of its
arguments. This allows the sum intrinsic to be lowered to only this HLFIR
operation, without needing several operations to convert and box
arguments. Having only one operation generated for the intrinsic
invocation should make optimization passes on HLFIR simpler.
However, the DIM argument will be loaded into memory (not allowing
passing pointers or references).
In lowering so far I did not allow the generation of array expression of i1 type (hlfir.expr<?xi1>) (but did not had it to the hlfir.expr verifier, maybe I should). The rational is that it is fine to handle scalar i1, but array of i1 are a bit problematic after bufferization because arrays are placed in memory, and placing i1 in memory would lead to an extra array temp overhead whenever this array needs to be passed to the runtime or another function (since it would need to be converted to an array with logical types).