This is an archive of the discontinued LLVM Phabricator instance.

[mlir][Affine] Enable fusion of loops with vector loads/stores
ClosedPublic

Authored by dcaballe on Jun 1 2020, 6:30 PM.

Details

Summary

This patch enables affine loop fusion for loops with affine vector loads
and stores. For that, we only had to use affine memory op interfaces in
LoopFusionUtils.cpp and Utils.cpp so that vector loads and stores are
also taken into account.

Diff Detail

Event Timeline

dcaballe created this revision.Jun 1 2020, 6:30 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 1 2020, 6:30 PM
ftynse accepted this revision.Jun 2 2020, 8:58 AM

This should be a canonical example of why analyses/passes should use interfaces

This revision is now accepted and ready to land.Jun 2 2020, 8:58 AM
andydavis1 accepted this revision.Jun 2 2020, 10:29 AM

This should be a canonical example of why analyses/passes should use interfaces

Indeed! Nice :)

This revision was automatically updated to reflect the committed changes.

This is really great - the same could be done for memref store to load forwarding, and it should work out of the box with affine.vector_load/store. But there's an issue with this (here and in general) if you mix vector load/stores with regular load/stores on the same memref. The slices computed by fusion aren't aware of the larger underlying loaded/stored data with the vectors. So the regions will be inaccurate at the boundaries. You are fine as long as you don't have memrefs that have both vector load/stores and scalar load/stores in different loop nests. With this patch, at the moment, if you mix the two, you'll get incorrect output from fusion say when the producer nest has scalar stores and the consumer vector loads. The slice won't have all the data needed.

This can be fixed by fixing the dependence information and memref region computation to account for vector load/stores. They would be inaccurate after this revision (and of course anyway inaccurate even if completely ignoring vector load/store).

Thanks Uday! Good point. I haven't looked too much into the slice computation but if you can send me some pointers and some examples on how the regions should be computed I could have a look. Thanks!