Page MenuHomePhabricator

AMDGPU: Don't report 2-byte alignment as fast
ClosedPublic

Authored by arsenm on Feb 10 2020, 10:26 AM.

Details

Reviewers
nhaehnle
rampitec
Summary

This is apparently worse than 1-byte alignment. This does not attempt
to decompose 2-byte aligned wide stores, but will stop trying to
produce them.

Also fix bug in LoadStoreVectorizer which was decreasing the alignment
and vectorizing stack accesses. It was assuming a stack object was an
alloca that could have its base alignment changed, which is not true
if the pointer is derived from a function argument.

Diff Detail

Event Timeline

arsenm created this revision.Feb 10 2020, 10:26 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 10 2020, 10:26 AM
rampitec added inline comments.Feb 10 2020, 11:57 AM
llvm/test/CodeGen/AMDGPU/chain-hi-to-lo.ll
210

It does not seem to be better to have 3 global_load_ushort then global_load_ushort + global_load_dword.

rampitec accepted this revision.Feb 10 2020, 12:19 PM

LGTM after some HW clarification.

This revision is now accepted and ready to land.Feb 10 2020, 12:19 PM