Current tileConsumerAndFuseProducers method allows for only
replacing the uses of the tiled consumer, and not the fused producers.
Based on how fusion works currently, it is not always possible/easy to
return a tensor value from the tiled loop nest to replace the uses of
the fused producers since the same tile of the fused producer might be
recomputed while computing multiple tiles of the consumer op. This
patch adds an option that will allow the transformation to return the
tensor replacements for the fused producer ops, but expects the caller
to control the tile sizes so that the tiles of the fused producer are
computed only once, by controlling which loops are tiled. There might
be an analysis that can determine this, but for now this left to
control from the caller.