When handling sub-byte emulation, the sizes of the converted memrefs
also need to be updated (this was not done in the current
implementation). This adds the additional complexity of having to
linearize the memrefs as well. Consider a memref<3x3xi4> where the
i4 elements are packed. This has a overall size of 5 bytes (rounded
up to number of bytes). This can only be represented by a
memref<5xi8>. A memref<3x2xi8> would imply an implicit padding of
4 bits at the end of each row. So incorporate linearization into the
sub-byte load-store emulation.
This patch also updates some of the utility functions to make better
use of statically available information using OpFoldResult and
makeComposedFoldedAffineApplyOps.
Can we just break it into two functions? Looking through the comment and usage, they are two methods with the same interface to me. Breaking it into two function and use them correctly helps readability a lot, and people won't be confused why std::ignore is used; they dont have to get back to the comments. The method name should just tell them what they get.