Refactor iteration graph code a bit so that the loop order can be computed incrementally and constraints that would create cycles can be skipped. This makes it possible to apply constraints in a more fine-grained way. The current implementation increases the asymptotic complexity of the procedure, so it may need some optimization. Some other improvements I have in mind are
- Add loop type constraints (filter < parallel < reduction) via edges as well, instead of implicitly.
- Relax constraints for dense tensors so that we only require the innermost dimension to be last. As far as I can tell, there is no need to have a specific ordering between dimensions otherwise.
- Add constraints for a single dense tensor as a group (e.g. only add them if all can be satisfied)
At the moment when running check-mlir one test fails. This is because now some of the Undef constraints can be applied for conv2d which moves the sparse loop outside.
Update:
I changed the conv2d check to match the code that is now generated.