It should be enabled only when the load alignment is at least 8-byte.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/test/CodeGen/AMDGPU/bfi_int.ll | ||
---|---|---|
13–14 | You can just use -DAG here |
Looks good to me. As a further enhancement you could call isDereferenceable on the MachineMemOperand to see if the extra 4 bytes that a widened load would access are guaranteed to be dereferenceable.
Here's a quick demo of this, on top of your patch: https://reviews.llvm.org/differential/diff/301868/
It fixes all the regressions in kernel-args.ll and store-local.96.ll.
If a dwordx3 load is 8-byte aligned then the three words loaded will be at addresses that are:
first word: 8 byte aligned
second word: 8 byte aligned + 4
third word: 8 byte aligned
Adding a fourth word to this load will put it at an address that is 8 byte aligned + 4, which is guaranteed to be in the same page as the third word, assuming your page size is greater than 4 bytes. So it can't cause any new page faults.
LGTM. As a further cleanup I wonder if it would be better to have a WidenOrSplitVectorLoad function which does the "NumElements == 3 && (Alignment >= 8 || Is16ByteKnownDereferenceable)" check for you.
You can just use -DAG here