This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add alignment check for v3 to v4 load type promotion
ClosedPublic

Authored by cdevadas on Oct 29 2020, 9:09 AM.

Details

Summary

It should be enabled only when the load alignment is at least 8-byte.

Diff Detail

Event Timeline

cdevadas created this revision.Oct 29 2020, 9:09 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 29 2020, 9:09 AM
cdevadas requested review of this revision.Oct 29 2020, 9:09 AM
arsenm added inline comments.Oct 29 2020, 9:11 AM
llvm/test/CodeGen/AMDGPU/bfi_int.ll
11–16

You can just use -DAG here

cdevadas updated this revision to Diff 301661.Oct 29 2020, 9:44 AM

Used -DAG to fix the test.

foad added a subscriber: foad.Oct 29 2020, 10:07 AM
cdevadas updated this revision to Diff 301824.Oct 30 2020, 12:38 AM
cdevadas edited the summary of this revision. (Show Details)

Made the alignment check to 8-byte boundary instead of 16-byte.

foad added a comment.Oct 30 2020, 5:28 AM

Looks good to me. As a further enhancement you could call isDereferenceable on the MachineMemOperand to see if the extra 4 bytes that a widened load would access are guaranteed to be dereferenceable.

foad added a comment.Oct 30 2020, 5:49 AM

As a further enhancement you could call isDereferenceable on the MachineMemOperand to see if the extra 4 bytes that a widened load would access are guaranteed to be dereferenceable.

Here's a quick demo of this, on top of your patch: https://reviews.llvm.org/differential/diff/301868/
It fixes all the regressions in kernel-args.ll and store-local.96.ll.

I'm not sure how I follow that 8 byte alignment is sufficient

foad added a comment.Oct 30 2020, 10:14 AM

I'm not sure how I follow that 8 byte alignment is sufficient

If a dwordx3 load is 8-byte aligned then the three words loaded will be at addresses that are:
first word: 8 byte aligned
second word: 8 byte aligned + 4
third word: 8 byte aligned

Adding a fourth word to this load will put it at an address that is 8 byte aligned + 4, which is guaranteed to be in the same page as the third word, assuming your page size is greater than 4 bytes. So it can't cause any new page faults.

cdevadas updated this revision to Diff 301970.Oct 30 2020, 11:49 AM

Addressed the suggestion by @foad to include the Dereferenceable check on MMO.

foad accepted this revision.Oct 30 2020, 1:07 PM

LGTM. As a further cleanup I wonder if it would be better to have a WidenOrSplitVectorLoad function which does the "NumElements == 3 && (Alignment >= 8 || Is16ByteKnownDereferenceable)" check for you.

This revision is now accepted and ready to land.Oct 30 2020, 1:07 PM
This revision was landed with ongoing or failed builds.Nov 1 2020, 12:34 AM
This revision was automatically updated to reflect the committed changes.