Recent change enable dropping unit-trip loops of "reduction" iterator
type as well. This is fine as long as there is one other "reduction"
iterator in the operation. Without this the initialized value (value
of out) is not read which leads to a correctness issue.
Also fix a bug in the fill -> tensor_reshape folding. The out
operand of the fill needs to be reshaped to get the out operand of
the generated fill operation.