While dense tensors support random accesses, it is critical to visit them in a row-major order for better cache locality. However, we previously consider dense inputs and outputs together when computing constraints for building iteration graph, it could lead us to less efficient iteration graphs.
This patch adds a new SortMask::kIncludeDenseInput to treat dense inputs/outputs separately when building iteration graph, thus increasing the chance for use to construct a better iteration graph.
A more fine-grained approach is to treat each input separately.
Note, related to:
https://github.com/llvm/llvm-project/issues/51651
I would organize these now as follows
or, if you don't like having them in the same place,
define the individual bits in the SortMask and define some consts for the "sets"