Although the target may support unaligned access, it's likely
still better to increase the alignment.
Details
- Reviewers
• tstellarAMD jlebar escha asbirlea
Diff Detail
Event Timeline
accessIsMisaligned returns false if the TTI reports that the access is not "fast". But here we seem to be ignoring that and aligning the access anyway.
Is there prior art for ignoring TTI and assuming that unaligned accesses are slower than aligned accesses? Alternatively, could you just make your target return false for "fast"?
I think what "fast" means is ambiguous. The way this is used changes the meaning of fast in the context of the uses. For example in DAGCombiner, this is used to check if it's OK to merge a store if it creates a less-aligned vector store. That's a bit different than the question of changing the underlying alignment. The unaligned operation might be faster to some unaligned expansion, but that doesn't mean it's still not better still to have the aligned access
lib/Transforms/Vectorize/LoadStoreVectorizer.cpp | ||
---|---|---|
822 | If the target does not support unaligned access of the type, the condition below may exit without vectorizing, but after having changed the alignment for the alloca. |
The unaligned operation might be faster to some unaligned expansion, but that doesn't mean it's still not better still to have the aligned access
Indeed, but there's a cost to increasing the alignment too, right? In particular, it prevents us from packing our stack as efficiently.
It's an assumption of this pass that vectorization is always beneficial. So of course we want to increase the alignment when the access would otherwise be illegal. And it also seems reasonable to increase the alignment when the TTI tells us the access would not be "fast". But increasing the alignment on the mere suspicion that it would be faster, without any indication from the target that it would in fact be an improvement...I am less comfortable with that.
An alternative would be to add a new late-IR pass that increases the alignment of all allocas and, on your targets, run that after the LSV. Then we'd be making a target-specific decision to enable the pass.
If the target does not support unaligned access of the type, the condition below may exit without vectorizing, but after having changed the alignment for the alloca.
Is this the intended behavior?