This patch adds custom lowering for sign- and any-extended loads of v2i8, v4i8 and v2i16. Instead of generating multiple loads followed by vector inserts, we now generate a single scalar load followed by a vector shuffle. This works was adapted from r213897 (the corresponding patch for the X86 target).
Diff Detail
Event Timeline
Hi Chad,
A couple of comments.
Thanks,
James
lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
2334 | Is this guaranteed to iterate in size order? Perhaps we sholud add a comment indicating that we rely upon this? (and an assert that the size of every item is greater than that of the preceding item?) | |
2343 | I'm worried as to how this assert can fire. The logic for getting here looks the same for extload and sextload, so how is this OK for extloads and not for sextloads (and where do sextloads that would fire this assert get filtered out?) | |
2390 | Why can't we zextload here (as well as sext and extloading)? Is there an instruction missing in the ISA for zexting? |
Just getting this off my radar and I don't think Matt plans on continuing this work. If that's not the case, feel free to add me back as a reviewer.
Thanks, Chad. I have no plans as of now for this patch. The small performance improvement I thought I saw with this I think was due to alignment instead. We can always resurrect it if needed. And thanks James for the initial review!
Is this guaranteed to iterate in size order? Perhaps we sholud add a comment indicating that we rely upon this? (and an assert that the size of every item is greater than that of the preceding item?)